vmg / rinku

Autolinking. Ruby. Yes, that's pretty much it.
ISC License
599 stars 67 forks source link

do not allow utf-8 in domains to avoid lookalike character attacks #63

Open grosser opened 8 years ago

grosser commented 8 years ago

http://www.irongeek.com/homoglyph-attack-generator.php

trying to port over https://github.com/zendesk/rinku/pull/5

but does not work ... missing something :/

@jcheatham /fyi @vmg sounds like a good feature ?

-"http://examplе.com"
+"<a href=\"http://examplе.com\">http://examplе.com</a>"
jcheatham commented 8 years ago

Looking at https://github.com/zendesk/rinku/pull/5 I think there's still an edge case here that I failed to capture in the tests. Note that I also changed:

    if (!isalnum(data[0])) return 0;
    for (i = 1; i < size - 1; ++i) {
      ...

to

    if (data[0] == '.' || data[0] == '-') return 0;
    for (i = 0; i < size - 1; i++) {

I think the first character still has a chance to be exploited here.

grosser commented 8 years ago

that change seems weird ... a strange e would not be . or - ... so I remove that change :D

On Tue, Oct 4, 2016 at 2:46 PM, Jonathan Cheatham notifications@github.com wrote:

Looking at zendesk#5 https://github.com/zendesk/rinku/pull/5 I think there's still an edge case here that I failed to capture in the tests. Note that I also changed:

if (!isalnum(data[0])) return 0;
for (i = 1; i < size - 1; ++i) {
  ...

to

if (data[0] == '.' || data[0] == '-') return 0;
for (i = 0; i < size - 1; i++) {

I think the first character still has a chance to be exploited here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vmg/rinku/pull/63#issuecomment-251523363, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAsZ30x8of6cak6I8Bfs9jgGxIW-Dc-ks5qwskmgaJpZM4KOLyc .

jcheatham commented 8 years ago

right, but notice that it also bumps the for loop forward by one, so if you had a strange 'e', isalnum('e') would be true, you wouldn't return 0/false, and you'd also skip past evaluating that character in the loop.

kivikakk commented 8 years ago

Thanks for the PR! This would be a lovely thing to get going, and I think it's important, but there's some bits missing to apply this to our fork.

rinku_isalnum doesn't consider non-ASCII UTF-8 text as valid — note its implementation — meaning the code change to autolink.c in this PR is actually a non-op (!), or rather, it does the same thing as rinku_isalnum.

It's notable that the tests are broken, because of complicating factors:

  1. check_domain will return true in three of the four assertions in the tests supplied — for the "www.exampl\u0435.com" cases, there is always at least one dot before we hit the non-ASCII-alphanumeric character.

    Likewise, for both of the Rinku::AUTOLINK_SHORT_DOMAINS cases (which overlaps with one of the above cases), allowing short domains means check_domain always returns true as long as the initial if (!rinku_isalnum(data[link->start])) doesn't cause a return false.

    Hence 3 of 4 assertions are currently failing.

  2. This would be okay if it at least caused the non-ASCII-alphanumeric character to be excluded from the resulting autolink. But link->end is manually set to utf8proc_find_space(data, link->end, size); after check_domain returns, in both autolink__url and autolink_www. So the character gets re-included.

    This issue probably arose because of the difference between your fork and this one.

  3. The above note about the initial character by @jcheatham is important in the source PR at zendesk#5, but it's avoided here because we check the first character with the above-mentioned if (!rinku_isalnum(data[link->start])) test, which doesn't allow for UTF-8.
jcheatham commented 8 years ago

Ah, I had thought rinku_isalnum was a portability wrapper for the standard library check, not that it was some beefy custom lookup table. And excellent info on everything else (I'm still operating with obsolete knowledge of the old version), so I'll just shut up now :)

grosser commented 8 years ago

... a little bit better now ... only thing failing is utf8-space ... does the direction look ok / any good idea how to fix that ?

-"This is just a test. <a href=\"http://www.pokemon.com\">http://www.pokemon.com</a> \u2028 "
+"This is just a test. http://www.pokemon.com \u2028 "
ACPK commented 7 years ago

@grosser @vmg - Any suggestions on implementing this or is this not an issue? :)

grosser commented 7 years ago

stil and issue ... we use a fork to get around it :( https://rubygems.org/gems/zendesk-rinku

northeastprince commented 10 months ago

@vmg @kivikakk is this gem still being maintained?

kivikakk commented 10 months ago

Not particularly, at least by me! rinku is from an earlier era of Rails development, and there are most likely better options today.

northeastprince commented 10 months ago

Wow, thanks for the quick response! 😆 From what I see, there's this, Zendesk's fork, and the original. Maybe the latter is the best at this point - I assume a 30x increase in speed from the already miniscule timing won't really make much of a difference in the long run.