gregjacobs / Autolinker.js

Utility to Automatically Link URLs, Email Addresses, Phone Numbers, Twitter handles, and Hashtags in a given block of text/HTML
MIT License
1.48k stars 238 forks source link

'&' at the end of a valid URL parses incorrectly #76

Closed Cogito closed 9 years ago

Cogito commented 9 years ago

"google.com&" should parse to "google.com&" but is instead parsed to "google.com&"

"google.com&" is not a valid URL

Cogito commented 9 years ago

It's worth noting that the situation I am running into occurs when something like google.com" is escaped to google.com"

This is then turned into google.com";

Note the trailing semicolon

warrenrumak commented 9 years ago

+1 I ran into this same issue yesterday, except with " around the URL instead of "

gregjacobs commented 9 years ago

Thanks for the report, will look into it

warrenrumak commented 9 years ago

Quick failing test I whipped up....

        it( "should handle a URL inside an HTML-encoded anchor tag", function() {
            var html = "Joe learned about anchor tags on the <a href="http://www.w3schools.com/aaa">W3SCHOOLS</a> site ...";
            var tobe = "Joe learned about anchor tags on the &lt;a href=&quot;<a href=\"http://www.w3schools.com/aaa\">w3schools.com</a>&quot;&gt;W3SCHOOLS&lt;/a&gt; site ...";

            var result = autolinker.link( html );
            expect( result ).toBe( tobe );
        });
nilscc commented 9 years ago

+1

Could possibly be solved by adding &quot; to the htmlCharacterEntitiesRegex ? I did this as a temporary workaround and it seems to work:

        var autolinker = new Autolinker();
        autolinker.htmlCharacterEntitiesRegex = /(&nbsp;|&#160;|&lt;|&#60;|&gt;|&#62;|&quot;|&#34;|&#39;)/gi;

This adds the regex for both " and '.

gregjacobs commented 9 years ago

Fixed in 0.15.2