jarrett / rbbcode

Converts BBCode to HTML. Gracefully handles invalid input. Built on Treetop.
41 stars 9 forks source link

[WIP] add option to convert bbcode to markdown #23

Closed Dahie closed 4 years ago

Dahie commented 4 years ago

This is still work in progress, but shows a possible direction discussed in #22

The tests are failing at the moment. I have a bit of a hard time with Treetop, it's documentation is … sparse with a steep learning curve.

Dahie commented 4 years ago

Yes, still open. So the goal is to translate

[quote]this is

a 
multiline quote
[/quote]

into

> this is
>
> a
> multine quote
>

My thinking went in to directions to achieve this. Either by replacing the newlines within the block node to \n> but that messes with included paragraphs and leaves empty new lines so far. Thinking two was having this dealt properly in Parslet and introducing QuoteLineItemsor something that get prepended with the >

jarrett commented 4 years ago

Ah, thanks for pointing that out. So tell me if I'm understanding this correctly:

In BBCode, the string apples\noranges is supposed to parse to the same HTML as apples oranges. In other words, the single newline character is treated as a space. But when converting to Markdown, you actually do want to preserve the single newline character in the output.

Did I get that right? If so, then I think your second option is the way to go. I.e. we'll have to extend the .treetop file so it emits different nodes for apples\noranges and apples oranges. Then, we'll add logic to consume these types of nodes in the HTML and Markdown rendering methods.

jarrett commented 4 years ago

BTW, I added comments to node_extensions.rb in the master branch just now. Hopefully this makes the interplay between the .treetop file and node_extensions.rb easier to understand.

Dahie commented 4 years ago

Ah, thanks for pointing that out. So tell me if I'm understanding this correctly:

In BBCode, the string apples\noranges is supposed to parse to the same HTML as apples oranges. In other words, the single newline character is treated as a space. But when converting to Markdown, you actually do want to preserve the single newline character in the output.

Did I get that right? If so, then I think your second option is the way to go. I.e. we'll have to extend the .treetop file so it emits different nodes for apples\noranges and apples oranges. Then, we'll add logic to consume these types of nodes in the HTML and Markdown rendering methods.

Good point, markdown's handling of when to paragraph and when to line breaks got me a few times already. It's documented here: https://www.markdownguide.org/basic-syntax/#paragraphs-1

[quote]this becomes    
a 
single line quote
[/quote]

would actually become

> this becomes a single line quote

While

[quote] this becomes

a  
multiline quote
[/quote]

should become

> this becomes
>
> a
> multiline quote

(Note the double-spaces behind the "a")

Dahie commented 4 years ago

Btw, what is the minimum supported ruby version of the gem? I'd tend to go for 2.5 given everything older is EOL

jarrett commented 4 years ago

What's the rationale for the commit removing sanitization (https://github.com/Dahie/rbbcode/commit/d19b45019a3c0cf4d342243f22efc92a5e405fe9)?

jarrett commented 4 years ago

By the way, when I updated Treetop to the latest version (1.6.10), I started getting the following error upon compiling the .treetop file:

NoMethodError: undefined method `expected' for #<Treetop::Runtime::SyntaxNode:0x0000557d8de861c8>
    /usr/local/bundle/gems/treetop-1.6.9/lib/treetop/compiler/node_classes/parenthesized_expression.rb:9:in `expected'

After a quick look at parenthesized_expression.rb in the Treetop source, I tend to think this is a Treetop bug. Reverting Treetop to 1.5.3 fixed the problem. (I also tried all later versions, and all raise the same error.) So I'll keep Treetop pinned to 1.5.3 for now. Let me know if you have any further insight into this.

Beyond that, I updated all dependencies and pushed to master.

Dahie commented 4 years ago

I ran into the same issue trying to update treetop and decided to stick with the old version. Good enough.

Dahie commented 4 years ago

What's the rationale for the commit removing sanitization (Dahie@d19b450)?

You can ignore the other branches in my repo. They were some experiments, but not for merging.

jarrett commented 4 years ago

Ok great, thanks! I'd suggest you pull from my master branch when you have a chance, since that'll get you all the updated dependencies. It shouldn't impact your new-feature development either way, though.

I'll give some thought to the best way to extend the .treetop file.

On Mon, Jun 1, 2020 at 3:30 AM Daniel Senff notifications@github.com wrote:

What's the rationale for the commit removing sanitization (Dahie@d19b450 https://github.com/Dahie/rbbcode/commit/d19b45019a3c0cf4d342243f22efc92a5e405fe9 )?

You can ignore the other branches in my repo. They were some experiments, but not for merging.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jarrett/rbbcode/pull/23#issuecomment-636664838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAD6UF4KWYUNZ5JXIVJQITRUNKIPANCNFSM4NBTKRSQ .

jarrett commented 4 years ago

This is taking me longer than I expected, but I have a partly working implementation on the master branch. I pulled from your repo, did some refactoring, and made an attempt at solving the blockquote problem. But there are still some bugs with paragraphs and line breaks. I'll keep working on those.

jarrett commented 4 years ago

I think I've worked out the paragraph and line-break bugs. Let me know what you think. We may be close to releasing a new version with Markdown support.

Dahie commented 4 years ago

Thank you looks very good! Only, the readme does not mention the markdown output feature and how to use it.

Feel free to close my PR when you release the new version :)

jarrett commented 4 years ago

The new version (1.1.0) is released. Thanks again for your help!