gregjacobs / Autolinker.js

Utility to Automatically Link URLs, Email Addresses, Phone Numbers, Twitter handles, and Hashtags in a given block of text/HTML
MIT License
1.48k stars 238 forks source link

How to sanitize input text #197

Open nikdo opened 7 years ago

nikdo commented 7 years ago

Is there a recommended way hot to sanitize text before passing it to Autolinker? The usual way to escape html tags will also distort some URLs.

olafleur commented 7 years ago

Could you give us a concrete example of what you are talking about? It would help to understand what is problematic. :)

nikdo commented 7 years ago

Sure.

If user input is: Take look at https://www.google.cz/?gfe_rd=cr&ei=GYmlWM-VG_Tf8geNmIfYDA#q=autolinker and <a href="javasript:alert('hi')">hi</a>

I need to render: Take look at <a href="https://www.google.cz/?gfe_rd=cr&ei=GYmlWM-VG_Tf8geNmIfYDA#q=autolinker">https://www.google.cz/?gfe_rd=cr&amp;ei=GYmlWM-VG_Tf8geNmIfYDA#q=autolinker</a> and &lt;a href=&quot;javasript:alert(&#039;hi&#039;)&quot;&gt;hi&lt;/a&gt;

So html characters &<>"' are escaped everywhere except the link href attribute.

This is something that Linkify plugin has solved but I don't see how I should do it using Autolinker.js.

olafleur commented 7 years ago

Not quite sure if I understand your question, but the goal of Autolinker is to link as much as possible so yeah, I think Autolinker will link the first link and also keep the the second part as a link. Maybe you could use the Linkify plugin for the part that you want to escape, but if the goal is to detect automatically that it is a html link and escape it, I don't think that Autolinker does it.

If you see a structure that Autolinker could have that could allow this, feel free to tell us and we'll see if we can implement it ! :)

nikdo commented 7 years ago

We both probably agree that Autolinker output is meant to be inserted into HTML. And that the original input we want to transform comes from the website user. For example we we want to improve his comment by making URLs clickable. Right?

Somewhere between the user inputting his text and rendering the output into HTML two things has to happen:

My question is how these these things should be orchestrated together. Linkify does both. Autolinker seems to perform only the second one and I don't know where to sanitize input without distorting URLs (concretely the & characters) for Autolinker.

nikdo commented 7 years ago

Side note: I already decided for Linkify over Autolinker because of this. But I would like to understand how do you expect this to be handled. Probably I'm just missing something. :smile:

nikdo commented 7 years ago

After further investigation it seems that Autolinker behavior is very similar to linkifyHtml method, which preserves HTML tags compared to linkifyStr method which escapes them.

Input: Pause & look at http://g.com?a=1&b=2. Plus <a href="dangerous">something dangerous</a>.

linkifyStr: Pause &amp; look at <a href="http://g.com?a=1&b=2" class="linkified" target="_blank">g.com?a=1&amp;b=2</a>. Plus &lt;a href="dangerous"&gt;something dangerous&lt;/a&gt;.

linkifyHtml: Pause & look at <a href="http://g.com?a=1&b=2" class="linkified" target="_blank">g.com?a=1&b=2</a>. Plus <a href="dangerous">something dangerous</a>.

Autolinker: Pause & look at <a href="http://g.com?a=1&b=2" target="_blank" rel="noopener noreferrer">g.com?a=1&b=2</a>. Plus <a href="dangerous">something dangerous</a>.

PRR24 commented 6 years ago

Upvote to add an option to disable html parsing and threat input as text as the linlkyStr example above. To be honest, I think the option should be also default on, as current behaviour is bit dangerous.

bkosborne commented 5 years ago

I'm using this in conjunction with sanitize-html. First I run the text through the autolinker plugin, and then that output thru sanitize-html

kylemh commented 2 years ago

Did this issue get closed via #313 ?