robinst / linkify

Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly
https://robinst.github.io/linkify/
Apache License 2.0
201 stars 12 forks source link

This one doesn't parse out the link right #54

Open mikedilger opened 1 year ago

mikedilger commented 1 year ago

An example that isn't parsing out right. Scheme can't have a '.' in it, right?

Just to show.. this is the list of relays that have seen this post.https://nostr.build/i/1105.png
kosayoda commented 1 year ago

It can, according to 3.1. Scheme:

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-").

It cannot start with a period, which means post .https://nostr.build/i/1105.png results in the "expected" output but post.https://nostr.build/i/1105.png does not since post.https is a valid scheme name.

I think a worthwhile feature for this library is to allow restricting schemes, since I would prefer matching https://nostr.build/i/1105.png in this case as well.

mikedilger commented 1 year ago

I stand corrected.

robinst commented 1 year ago

Yeah. I think having an option to provide an allow-list of schemes to recognize would be nice (added a "help wanted" label). You could pass in https, http and it would only return URLs with that (and stop at characters like . or +). It's a bit trickier than that because you could also provide post.https and https and in that case it should probably use the longer match.