robinst / linkify

Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly
https://robinst.github.io/linkify/
Apache License 2.0
201 stars 12 forks source link

Are wildcard operators allowed in links? #37

Closed mre closed 2 years ago

mre commented 2 years ago

We received an interesting issue for lychee here: https://github.com/lycheeverse/lychee/issues/604. The gist is that links with wildcard operators (e.g. https://*.example.com) get properly extracted by linkify, but we cannot easily check them with lychee.

We now wonder if we should handle the exclusion of such wildcard URLs on our end or whether it's better to discard them right away with linkify. From my perspective, these are valid links, so linkify is correct, but the behavior might be surprising to downstream users. Perhaps another config option could be introduced which would allow to skip such wildcard URLs:

let finder = LinkFinder::new().wildcards(false);

However there are other sub-delimiters, so I could understand if such cases won't be handled by linkify. In this scenario we would filter them in lychee instead.

mre commented 2 years ago

Created a PR with the mentioned changes here: https://github.com/robinst/linkify/pull/38

robinst commented 2 years ago

Those kinds of URLs are no longer extracted by linkify with 0.9.0: https://github.com/robinst/linkify/blob/main/CHANGELOG.md#090---2022-07-11