robinst / linkify

Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly
https://robinst.github.io/linkify/
Apache License 2.0
201 stars 12 forks source link

Added a flag url_can_bi_iri that allows disabling IRI parsing #56

Closed serega closed 1 year ago

serega commented 1 year ago

linkify by default parses Internationalized Resource Identifiers (IRI) according to rfc3987. As mentioned in #49 this behavior incorrectly extracts links without scheme when surrounded by Unicode characters without a space, which is valid in some languages. So, 地址example.org is a valid IRI, but the desired behavior is to extract URL example.org. I added flag to url_can_be_iri that when set to false disables parsing unicode characters. The default behavior is unchanged. LinkKind is meant to be extendable for other types of links, and I thought adding LinkKind::Iri, but that would make the library backwards incompatible.