[WIP] fixed generation of bbcode for uppercased tags

jbowens / jBBCode

A lightweight but extensible BBCode parser

http://jbbcode.com

MIT License

164 stars 32 forks source link

[WIP] fixed generation of bbcode for uppercased tags #76

Open frastel opened 7 years ago

frastel commented 7 years ago

This PR is actually based on my other cleanup PR. This PR makes only sense after the cleanup PR has been merged.

I found two issues:

when uppercased tags were used and the text was invalid than the code was not able to re-build the original code. [URL=INVALID]foo[/URL] was generated to [url=invalid]foo[/url].
when options were enabled but the tag itself had no option and the text was invalid than the code added an empty =. So [URL foo=bar]INVALID[/URL] was generated to [url= foo=bar]INVALID[/url].

I am not quite sure if the change in ElementNode::getAsBBCode for the empty = is the best solution. Therefore this PR is still WIP. Any feedback appreciated!

DaSourcerer commented 7 years ago

Ugh! Can you remove 2bd7dc7 and force push? This is extremely hard to assess correctly as-is. Appreciate your effort, though!

frastel commented 7 years ago

@DaSourcerer Sure no problem. Branch is cleaned.

shmax commented 7 years ago

when uppercased tags were used and the text was invalid than the code was not able to re-build the original code. [URL=INVALID]foo[/URL] was generated to [url=invalid]foo[/url].

Can you elaborate a little on what problems that causes?

DaSourcerer commented 7 years ago

@shmax I think, he doesn't experience any actual problems safe for the parts of the input that could not be parsed as tags being forcibly turned all-lowercase. I am actually in support of this, but find this implementation not very convincing. There has to be a better way than to keep tagname and the original token when one is directly derived from the other. Could be this is a bit of a hint some refactoring may be in order? After all, we've got ElementNode.getTagName() and ElementNode.getCodeDefinition().getTagName(). Latter one is already guaranteed to be lowercase. No easy win, though ...

Oh, and while we're at it: Do we want to support tagnames with multibyte characters in them?

shmax commented 7 years ago

There has to be a better way than to keep tagname and the original

That was my thought as well, particularly as he's doing some run-time string operations that weren't there before, even with the double storage.

frastel commented 7 years ago

@shmax

Can you elaborate a little on what problems that causes?

I didn't expected any text modification when an invalid code is given. For my current project the strtolower modification is not accetable.

@DaSourcerer The changes might not be very convincing however I didn't want to introduce too many changes at once in one of my first contributions to this project and thus I tried to not break the current structure.

Kubo2 commented 7 years ago

@DaSourcerer:

Do we want to support tagnames with multibyte characters in them?

Just curious: is it very difficult to implement compared to the current implementation?

DaSourcerer commented 7 years ago

@frastel: As I said, this may be a hint some refactoring were in order. I understand your reluctance to introduce possibly code-breaking changes, but I just feel there is a better solution. Btw: That would be a shortcoming on our side, not yours ;)

@Kubo2: It is so-so. We would introduce a hard dependency on the mb extensions (which is probably less probelmatic than I may make it sound like here). We also use the normalized (i.e. lowercased) tag names for various lookups. I am not quite sure how array keys with mb-characters in them behave. Worst scenario: We would lose all 𝓞(1) lookups and have them replaced with 𝓞(n). Worth it? I really don't know ...