jarrett / rbbcode

Converts BBCode to HTML. Gracefully handles invalid input. Built on Treetop.
41 stars 9 forks source link

Added support for [color] tag #15

Closed femaref closed 11 years ago

femaref commented 11 years ago

I've added support for the color tag, it also parses any inner bbcode successfully. Extra precaution has been taken with sanitizing the resulting tag, as it involves inline css.

jarrett commented 11 years ago

Looks good. I'll double-check the sanitization of the style attribute. I'll try my best to inject malicious CSS into it, and if I can't, I'll probably go ahead and merge. Thanks!

femaref commented 11 years ago

Yup, I'm not sure it wont break. It's not a really thorough sanitization, so it probably will break.

jarrett commented 11 years ago

So, just testing your regex (^color:.+?;$) in isolation, I'm finding the following strings match:

color:#ffff00; (this is good, it should match) color: #ffff00; (also good) color: #ffff00; font-size: 100px; (bad, this is malicious CSS)

I propose the following regex instead:

^color: *((#[a-f0-9]{3,6})|([a-z]+));$

What do you think?

On Tue, Mar 19, 2013 at 12:37 PM, Femaref notifications@github.com wrote:

Yup, I'm not sure it wont break. It's not a really thorough sanitization, so it probably will break.

— Reply to this email directly or view it on GitHubhttps://github.com/jarrett/rbbcode/pull/15#issuecomment-15130582 .

femaref commented 11 years ago

Hm, shouldn't .+? be a non-greedy variant of .+? but yes, yours should work as well. We should force lower-case names then as well due to the regexp.

jarrett commented 11 years ago

Yeah, my thinking is, we use a super-strict regex that's essentially a white-list. We're not trying to allow every possible valid variation on a color rule. Just the ones we generate from our BBCode [color] tags. So something that just looks for our exact representation of the color rule would be best. Which makes me realize, mine should be modified to be less permissive:

^color:((#[a-f0-9]{3,6})|([a-z]+));$

femaref commented 11 years ago

Yes, you are right. I've added the changes. I'll have to improve the parsing as well (to get rid of extra white spaces etc).

jarrett commented 11 years ago

Ah, good point. We can add unit tests for all the weird but valid ways users might write a [color] tag, plus all kinds of malicious stuff.

On Tue, Mar 19, 2013 at 1:05 PM, Femaref notifications@github.com wrote:

Yes, you are right. I've added the changes. I'll have to improve the parsing as well (to get rid of extra white spaces etc).

— Reply to this email directly or view it on GitHubhttps://github.com/jarrett/rbbcode/pull/15#issuecomment-15132439 .

femaref commented 11 years ago

Effectively, we should allow two cases: [color=red] and [color=#ffffff] (with the special case [color=#fff]). Maybe we should enforce the patterns in the parser?

jarrett commented 11 years ago

Yes, that would be reasonable. Something I struggled with in all the other tags: How permissive should we be about stupid variations? For example:

[color = red] [color= red] [color =red]

I go back and forth on this. Lately my attitude is to just be strict about syntax. Users should be able to read a helpfile on BBCode and see that they can't, e.g., throw in arbitrary spaces. But that does present one potential problem: If somebody is migrating from a legacy parser to RbbCode, and they already have a big corpus of user-supplied markup, will we break that corpus? I.e., what if our syntax is stricter than that of the legacy parser?

On Tue, Mar 19, 2013 at 1:08 PM, Femaref notifications@github.com wrote:

Effectively, we should allow two cases: [color=red] and [color=#ffffff](with the special case [color=#fff]). Maybe we should enforce the pattern in the parser?

— Reply to this email directly or view it on GitHubhttps://github.com/jarrett/rbbcode/pull/15#issuecomment-15132612 .

femaref commented 11 years ago

Even though it's opening pandora's box, what about a legacy mode that allows extra whitespaces around the equals sign? Maybe also emits well-formed, strict bbcode.

jarrett commented 11 years ago

Probably more than I'd like to maintain :). The same would have to be done for all tags.

I'm leaning towards this: Continue being pretty strict about input. If anyone needs to support a legacy codebase, they can submit a request. That way, we avoid building features someone might need until the need has actually been demonstrated.

BTW, I just noticed that there's an inconsistency in the strictness of the existing code. For tags defined with #def_tag, we allow uppercase and lowercase. For all other tags, we don't. I wouldn't quite call it a bug, but it is weird and should probably be fixed.

On Tue, Mar 19, 2013 at 6:45 PM, Femaref notifications@github.com wrote:

Even though it's opening pandora's box, what about a legacy mode that allows extra whitespaces around the equals sign?

— Reply to this email directly or view it on GitHubhttps://github.com/jarrett/rbbcode/pull/15#issuecomment-15150869 .

jarrett commented 11 years ago

Do you want to move forward with this pull request?

Also, I propose one more change. I just remembered that we shouldn't be using ^ and $ to mark the beginning and end of strings in the regex. Here's a rundown of the vulnerability:

http://homakov.blogspot.com/2012/05/saferweb-injects-in-various-ruby.html

So the preferred way to match the beginning and end of strings in a regex is \A and \z.

jarrett commented 11 years ago

Let me know if you're still interested in this. If not, I can close out the issue. Thanks!

femaref commented 11 years ago

Sorry, I've been pretty busy with other stuff - feel free to close it.