mybb / mybb2

The repository for the MyBB 2 forum software. Not to be used on live boards.
https://www.mybb.com
BSD 3-Clause "New" or "Revised" License
109 stars 45 forks source link

MyBB 2 Parser #193

Open xaoseric opened 8 years ago

xaoseric commented 8 years ago

@euantorano Why not just use the s9e\TextFormatter parser instead of reinventing the wheel for parsers? It supports both BBCode and Markdown. Also includes mediaembeds out of the box. We could use it as a base to integrate with.

https://github.com/s9e/TextFormatter http://s9e.github.io/TextFormatter/demo.html http://s9e.github.io/TextFormatter/fatdown.html

euantorano commented 8 years ago

Mostly because I'd never heard of it. That would indeed make life far easier. I'll look into it and make sure it meets our requirements. The license certainly works for us (MIT). @mybb/developers - thoughts?

ATofighi commented 8 years ago

Looks good to me...

I think we can use it.

euantorano commented 8 years ago

Yep, same thoughts here @ATofighi

JoshyPHP commented 8 years ago

Hi. I'm s9e\TextFormatter's author. Feel free to tag me, mention me or use whichever notification system is in place to alert me if I miss any related discussion.

@euantorano I'm not surprised you've never heard of it because I haven't advertised it outside of forum software's own forums, such as MyBB's dev forum. I'm fine with NBBC being the top result in Google for "bbcode library" because it's good enough for most individuals.

The library itself is in use in phpBB 3.2 and Flarum.

Destroy666x commented 8 years ago

It definitely looks interesting

euantorano commented 8 years ago

Hi,

Thanks for contacting us. I'm not surprised I missed it, given that thread was 2013 - we're pretty slow at development!

We have discussed this as a team and it looks like we will be using TextFormatter, I'm just going to write a small wrapper around it to add some of our slightly different MyCodes.

On 22 May 2016, at 23:57, JoshyPHP notifications@github.com wrote:

Hi. I'm s9e\TextFormatter's author. Feel free to tag me, mention me or use whichever notification system is in place to alert me if I miss any related discussion.

@euantorano I'm not surprised you've never heard of it because I haven't advertised it outside of forum software's own forums, such as MyBB's dev forum. I'm fine with NBBC being the top result in Google for "bbcode library" because it's good enough for most individuals.

The library itself is in use in phpBB 3.2 and Flarum.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

martec commented 7 years ago

I'm not against it. but... I am currently doing conversion of Rin Editor (https://community.mybb.com/thread-189422.html) to phpBB that use s9e/TextFormatter. The problem is number of lines ignored after block elements. If has block element and next is inline element need two '\n', but if has two consecutive blocks, now need three '\n' to create one br element. Due to this inconsistency getting quite difficult for the wysiwyg editors to work properly. If we are to have global control of how many lines will be skipped before and after block elements then no problem, but if you do not have control of that I think you'd better think hard.

Ps. In phpBB not has control for many lines will be skipped before and after block elements.

Current parser of mybb is consistent in relation to this and friendly to editors wysiwyg. Current only one '\n' is ignored, no matter if next element after block element is inline or block element. After block element alway one '\n' is ignored in mybb.

More detail: https://www.phpbb.com/community/viewtopic.php?f=461&t=2309881&start=30#p14762911 https://www.phpbb.com/community/viewtopic.php?f=461&t=2309881&start=30#p14779781

JoshyPHP commented 7 years ago

Due to this inconsistency getting quite difficult for the wysiwyg editors to work properly.

How nice would it be to live in a world where newlines are the biggest issue with WYSIWYG editors.

martec commented 7 years ago

How nice would it be to live in a world where newlines are the biggest issue with WYSIWYG editors.

I do not disagree that it has only problems with that. But Rin Editor for mybb is stable enough. I have nothing against your parser. But if I do not have control for many lines will be skipped before and after block elements, i probably will not develop plugin Rin Editor for mybb 2, because since I know I'm going to have the problems.

Azareal commented 7 years ago

I'm curious about the speed of this library, the most important thing when it comes to BBCode / Markdown is not accuracy (although, it would be nice), but speed. How does it compare to simply plopping down a Regex to match the tags? Some benchmarks would be nice.

Also, we might want to explore the possibility of compiling from text to some sort of AST / bytecode (some sort of intermediary format) which is stored in the database, and then iterating over that to produce the final result.

Ideally, the parser wouldn't use regular expressions at all, but that might be difficult to make efficient in PHP.

euantorano commented 7 years ago

Parsing in MyBB 2 happens when a post is made or edited rather than when a post is viewed which improves performance anyway.

I haven't yet worked out how well handle new smileys/BBCode being added with this though.

On 29 Jul 2017, at 06:56, Azareal notifications@github.com wrote:

I'm curious about the speed of this library, the most important thing when it comes to BBCode / Markdown is not accuracy (although, it would be nice), but speed. How does it compare to simply plopping down a Regex to match the tags? Some benchmarks would be nice.

Also, we might want to explore the possibility of compiling from text to some sort of AST / bytecode (some sort of intermediary format) which is stored in the database, and then iterating over that to produce the final result.

Ideally, the parser wouldn't use regular expressions at all, but that might be difficult to make efficient in PHP.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

euantorano commented 7 years ago

Also, the current code is in the mybb/parser repository.

On 29 Jul 2017, at 06:56, Azareal notifications@github.com wrote:

I'm curious about the speed of this library, the most important thing when it comes to BBCode / Markdown is not accuracy (although, it would be nice), but speed. How does it compare to simply plopping down a Regex to match the tags? Some benchmarks would be nice.

Also, we might want to explore the possibility of compiling from text to some sort of AST / bytecode (some sort of intermediary format) which is stored in the database, and then iterating over that to produce the final result.

Ideally, the parser wouldn't use regular expressions at all, but that might be difficult to make efficient in PHP.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

JoshyPHP commented 7 years ago

@Azareal As @euantorano mentioned, parsing and rendering are already decoupled. There's a small informal benchmark script in the repository that you can run in CLI or via a web server.

Generally, parsing is a matter of milliseconds and rendering is a matter of microseconds.

Azareal commented 7 years ago

Okay, I looked at that link, although I haven't looked at everything.

Storing it as an array on the database side or some sort of tree structure would probably be a lot more efficient than using XML and parsing it on the PHP side. Most databases, with the exception of MySQL, have arrays.

One thing MySQL does have however is native support for JSON. PostgreSQL has even better support for JSON. The only database which doesn't is MariaDB which has emulated support, but not native support.

According to MariaDB's devs though, MySQL is ridiculously slow, so their native support is on par with MariaDB's emulated support. I don't know how accurate that is, but I'll throw that here.

There are several basic BBCode types which could probably be inlined aka converted to HTML and stored as text in the structure. That would be bold, italics, etc. aka the most commonly used BBCode. Disabling those is pointless. If you don't want BBCode, you might as-well disable the entire BBCode parser and just use Markdown.

And 90% of the time, those are the only BBCode someone will use, so you do some further optimisations. Even if the other BBCode are really slow, if you can speed up 90% of requests, then their performance impact is essentially nil in the bigger picture.

Handling new BBCode shouldn't be difficult, if this library is coded properly, as anything with [ and ] should be considered a BBCode element, albeit an unknown one which might be printed verbatim, if MyBB doesn't recognise it.

If you go down the road of emoji (which are taking the world by storm), then every emoji is fenced by : at the start and : at the end for the shortcodes, although it might be trickier, if someone's pasted the raw Unicode from somewhere else, or if their system has an emoji keyboard.

Classic emoticons are trickier, as they could be anything, although for the most part, the world seems to be moving towards emoji.