gregjacobs / Autolinker.js

Utility to Automatically Link URLs, Email Addresses, Phone Numbers, Twitter handles, and Hashtags in a given block of text/HTML
MIT License
1.48k stars 238 forks source link

Autolinker vulnerable to RTLO URL spoofing attacks #377

Open dbrgn opened 2 years ago

dbrgn commented 2 years ago

When a URL string contains a unicode RTLO (right-to-left override) character, this results in two separate links being generated.

Context: https://www.bleepingcomputer.com/news/security/url-rendering-trick-enabled-whatsapp-signal-imessage-phishing/

PoC (screenshot, because GitHub would otherwise show the rest of the line as right-to-left):

screenshot-20220330-141920

Note that my terminal ignores the RTLO, but it's there. In the browser (Chrome in this case), this link looks like this:

screenshot-20220330-142044

The left and right part are separate links. When the user clicks on the right part, shown as aisa.mp4, they are redirected to http://4pm.asia/ instead.

I'm not sure what the ideal solution would be. If autolinker would strip all RTLO characters passed to it, then legitimate text (that's not part of an URL) might break. However, an RTLO should not be part of the actual URL. (Autolinker does basically no URL encoding whatsoever. Maybe that should be changed? An urlencoded RTLO would be handled (and ignored) by the browser.)

caleb15 commented 2 years ago

@gregjacobs wondering if you saw this? Snyk lists this as a high-level vulnerability.

ShaharAdskAcc commented 2 years ago

Any idea when it's going to be fixed?

Gilgahex commented 2 years ago

@gregjacobs @olafleur

Please see this pull request

https://github.com/gregjacobs/Autolinker.js/pull/382

moblezin commented 2 years ago

Hey, thanks for the fix, when will you be able to publish it to npm?

Gilgahex commented 2 years ago

I'd love to but I don't have write access, so I'd have to fork it and create a new pkg on npm

RomekRJM commented 2 years ago

As mentioned by @caleb15 this is high on SNYKs vulnerability risk. Any idea when the next Autolinker release is going to be? I see past 2 years it's been released annually.

gregjacobs commented 2 years ago

Hey guys, waiting on a viable PR for this. The PR linked in this thread doesn't seem to do anything (although I'm surprised to see that the test passed. Perhaps there is a difference in which node version is running?)

yadue commented 2 years ago

@gregjacobs so you're saying that the fix provided doesn't work when you run it but units went correctly and fixed the original issue? I simply cannot understand that. Provided PR does exactly what it is supposed to do and the community is waiting for this PR for ages already.

Comment out url.replace('\u202E', ''); and you'll see the result. Provided PR works fine, please merge it and deploy new version.

xfournet commented 2 years ago

@yadue when i test it, the PR is not working, the added test test fails with or without the patch in url-match.ts (see comments in PR)

yadue commented 2 years ago

@xfournet yeah indeed, yesterday it worked simply fine for me using this PR, today it's not which is extremely weird. I'll work on fix.

gregjacobs commented 2 years ago

Hey guys, this has been fixed and released in v3.16.1 šŸ‘

yadue commented 2 years ago

@gregjacobs Thanks a lot, sorry for broken units, Iā€™m pretty sure I didnā€™t have any broken tests only local env. Anyway it would be good to think about running GitHub actions also for PRs so issues like that could be easily avoided.

gregjacobs commented 2 years ago

Yeah, big time on GitHub actions. I had that set up at some point actually but seems to have broken.

Just thinking about this issue a little more now actually: is it the best thing to do to simply strip out these RTL characters? Or should we just include them in the generated link?

gregjacobs commented 2 years ago

Ugh, just thinking about this more: with this fix of stripping out RTLO characters, we're basically preventing Autolinker from being used on any string with RTLO characters at all.

Might have to revert the fix. Any other ideas on how to solve? My only thought is to include the RTLO characters in the link itself so that it doesn't become two separate links

gregjacobs commented 2 years ago

Reverted with v3.16.2 for now

dbrgn commented 2 years ago

Yeah, that's this part of the issue description:

I'm not sure what the ideal solution would be. If autolinker would strip all RTLO characters passed to it, then legitimate text (that's not part of an URL) might break.

Maybe this would be a feasible approach?

Autolinker does basically no URL encoding whatsoever. Maybe that should be changed? An urlencoded RTLO would be handled (and ignored) by the browser.

yadue commented 2 years ago

@gregjacobs @dbrgn here's another proposition, another could be a simple configuration flag to enable stripping for the ones who don't need directional change characters.

yadue commented 2 years ago

@gregjacobs Here you have 2 separate solutions for this problem and at this stage that's all I could spend time on that. Do whatever you think is best for this library but remember that multiple companies have troubles as it is high severity issue. They are not ideal but they fix somehow the original issue.

BTW with recent changes you completely destroyed the execution time of unit tests as because of repeated tests almost 20000 test cases are being generated.

dbrgn commented 2 years ago

but remember that multiple companies have troubles as it is high severity issue

But is it? From what I can tell, this is a trick to make certain types of phishing / user deception a bit easier. It is no vulnerability in itself. Pulling off a successful attack with this is still not trivial.

For half a year, nobody cared about the issue. Then Snyk lists it as high severity (I disagree about the classification), and all of a sudden a lot of people think it's highly critical because they want their company processes to show "green" and not "red" šŸ™ƒ

For a proper fix, questions I'd ask would be:

Thanks @gregjacobs for the maintenance! URLs are a hairy business. You owe us nothing, but I appreciate your work (be it implementation or review) šŸ™‚

(Edit: @yadue I commented both your PRs. Thanks for taking the time to propose a fix!)

yadue commented 2 years ago

@dbrgn for me itā€™s also funny itā€™s ranked high and to be honest I donā€™t know how the fix should looks like to make snyk happy. Remember that for complaint companies (like the one I work for) itā€™s absolutely critical to have all external dependencies secure. Maybe it would be worth to reach out snyk and explain itā€™s actually expected result. If no other solution will be done here weā€™ll do a fork with one of my solutions as we simply cannot stay with ā€œvulnerableā€ library and itā€™s better to simply remove them then being ā€œhackedā€ in that or another way. We simply cannot afford it.

gregjacobs commented 2 years ago

Hey @dbrgn, @yadue, thanks for the comments and the PRs - really appreciate it.

I actually like the idea of URL-encoding the RTLO characters even though Autolinker doesn't currently encode other characters. Seems like the safest thing and we won't be accidentally stripping out legitimate uses of RTLO characters elsewhere in the text. Thoughts?

dbrgn commented 2 years ago

@gregjacobs I'm not sure how Autolinker.js is structured, but would it be feasible to run every URL (the part inside the href attribute) through the following logic?

function encodeUrl(rawUrl: string): string | undefined {
    try {
        return new URL(rawUrl).href;
    } catch (e) {
        // Do not linkify this URL, it's invalid
        return;
    }
}

Browser support seems to be fine: https://caniuse.com/url

It seems to do the trick, and will let the browser handle the URL:

screenshot-20220908-165146

(Would need to be verified first within Autolinker.)

And, if you'd like a laugh, here's how the Chromium debug console handles RTLO in the hostname part šŸ˜‚

screenshot-20220908-165359

gregjacobs commented 2 years ago

It would actually part of the parser that we'd need to detect the RTLO characters so that we don't cut off one URL and then start a new one. @yadue's PR https://github.com/gregjacobs/Autolinker.js/pull/388 is actually quite close. @yadue, would you like to take a stab at URL-encoding the RTLO chars rather than rejecting the match?

yadue commented 2 years ago

@gregjacobs sure, Ill have a look. Do you have an idea where this escaping should be done? Also please take a look at units and number of generated test cases, not sure if it was expected result but there are almost 20k cases coming from last commits and for some reason it takes literally ages when run locally.

gregjacobs commented 2 years ago

Hah, it actually generates upwards of 100,000 test cases :) I may have gone overboard on that but with rewriting the URL parser from scratch, I wanted to make sure that every possible case was handled. I'll see if I can reduce it down a bit at this point. I'm not sure why jasmine just sits and hangs for a while even after running a single test case though - maybe it's time to switch to mocha or something and see if there's still the same issue.

For encoding the RTLO chars, try adding this conversion into the UrlMatch class's getAnchorHref and getAnchorText methods.

kevin-from-atlassian commented 2 years ago

Hi there, @gregjacobs. Are there any updates to this issue? Does #391 work as a fix?