robinst / linkify

Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly
https://robinst.github.io/linkify/
Apache License 2.0
201 stars 12 forks source link

More strict parsing of hostname (authority) part of URLs #43

Closed robinst closed 2 years ago

robinst commented 2 years ago

Applies to emails, plain domains URLs (e.g. example.com/foo) and URLs with schemes where a host is expected (e.g. https).

This fixes a few problems that have been reported over time, namely:

It's a tricky change and hopefully this solves some problems while not introducing too many new ones. If anything unexpectedly changed for you, please let us know!

robinst commented 2 years ago

As a nice side effect of this change, the benches have improved as well, 15% for some (comparing 9a6ce3981003b2140a16eda863ded477bf1488f0 with 5bfb5167340e0436cd7d1b74f7d4997c58cec9e2):

no_links                time:   [25.885 ns 25.892 ns 25.901 ns]
                        change: [-2.2092% -2.0791% -1.9485%] (p = 0.00 < 0.05)
                        Performance has improved.

some_links              time:   [314.23 ns 314.37 ns 314.54 ns]
                        change: [-2.6413% -2.1750% -1.7947%] (p = 0.00 < 0.05)
                        Performance has improved.

heaps_of_links          time:   [1.0488 us 1.0502 us 1.0515 us]
                        change: [-14.292% -14.133% -13.997%] (p = 0.00 < 0.05)
                        Performance has improved.

some_links_without_scheme
                        time:   [389.63 ns 390.09 ns 390.54 ns]
                        change: [-15.163% -15.040% -14.905%] (p = 0.00 < 0.05)
                        Performance has improved.
mre commented 2 years ago

Thanks for your work and congratulations to the new release.