Open alexwennerberg opened 2 years ago
Someone brought what appears to be a bug to my attention: https://lists.flounder.online/crabmail/threads/1beaffd2384b.html
Hmm that link doesn't load for me, could you provide a copy here?
Ah, sorry -- I shuffled things around a bit on my site. Here's the fixed link:
I see. That's an interesting case, because —
can currently be part of an URL, e.g. like this:
https://www.example.com/—
In that case, the whole text including em-dash would get linked.
Also note that GitHub behaves the same way here:
https://www.example.com— https://www.example.com/—
We could fix the case where it's part of the domain, see also #29 which has some discussion around that. But what would you expect with the case where it's part of the path?
I think that if it's part of the path it should be treated as such. I guess this is a broader question, whether this library should reject invalid TLDs? like:
https://lists.flounder.online/test/threads/YgYAU45J+dZURu1F@localhost.lan.html
I think that the tradeoffs that you've made with the library as written are reasonable though
I've reworked domain parsing in 0.9.0 (see https://github.com/robinst/linkify/blob/main/CHANGELOG.md#090---2022-07-11), but I haven't addressed this yet.
I think we could now do this by rejecting TLDs that contain non-alphanumeric Unicode characters. Note that there are TLDs that contain non-ASCII characters, see examples here (but they would be alphanumeric): https://en.wikipedia.org/wiki/Internationalized_country_code_top-level_domain
Hi! Thanks for this library -- I use it in my new mailing list software to detect links in emails. Someone brought what appears to be a bug to my attention: https://lists.flounder.online/crabmail/threads/1beaffd2384b.html
Here's my code: https://git.alexwennerberg.com/crabmail/file/src/utils.rs.html#l22
I think that this could unambiguously be parsed, but I'm not 100% sure. What do you think?