Better URL recognition - Githubissues

scrod / nv

Notational Velocity: modeless, mouseless Mac OS X note-taking application

http://notational.net

GNU General Public License v3.0

2.07k stars 404 forks source link

Better URL recognition #178

Open tewe opened 13 years ago

tewe commented 13 years ago

Currently it gets confused by the parentheses in Markdown syntax: https://img.skitch.com/20110308-x9qadnn4xhfet78cuddq8d8dx1.png

It might help to use: http://daringfireball.net/2010/07/improved_regex_for_matching_urls

scrod commented 13 years ago

John Gruber's regular expression was actually the first place I started, and it turned out to make several overlapping match-group comparisons for any example containing actual markdown URL-formatting with parentheses. And these specific matching ambiguities essentially force all mainstream regex engines (Ruby 1.8, PCRE, ICU) to consider a combinatorial explosion of matches, leading to very long hangs. You can see this earlier implementation here: https://github.com/scrod/nv/blob/be24a7f86d36d910330c389a5f0464c70b2f5d92/AttributedPlainText.m#L169

ghost commented 12 years ago

I didn't want to start a new issue, but here's some URLs that don't get auto-linked as expected.

[example.com/](http://www.example.com)
[url=http://www.example.com/]Example[/url]
[page](http://www.example.com/page.)
http://www.example.com/
http://example.com/#aa/index%23.html/
feed:https://example.com/feed/atom
http://example.com/page!
magnet:?xt=urn:btih:8ac3731ad4b039c05393b5404afa6e7397810b41&dn=ubuntu-11.10-desktop-i386.iso&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80