`test()` and `find()` do not find the same strings as valid/invalid links

rfgamaral commented 9 months ago

Hello there 👋.

Going straight to the point:

find()

linkify.find('1.nf3')

Output:

[{
  "type": "url",
  "value": "1.nf",
  "isLink": true,
  "href": "http://1.nf",
  "start": 0,
  "end": 4
}]

test()

linkify.test("1.nf3")

Output:

false

I understand that Linkify is not 100% spec compliant, but shouldn't test() and find() return the same output? More importantly, linkify.find('1.nf3') should return [] because .nf3 is not a valid TLD while .nf is (ref).

Is this a bug, or is this somehow intended?

nfrasser commented 9 months ago

Hi @rfgamaral, thanks for the report. The output of find has 1.nf, which is a valid URL. Notice that it excludes the final 3. test returns false because while "1.nf3" does contain a valid URL, the full string is not a URL.

You can see this this clearly when you run linkify.tokenize:

linkify.tokenize("1.nf3").map(token => token.toObject())

Output:

[
    {
        "type": "url",
        "value": "1.nf",
        "isLink": true,
        "href": "http://1.nf",
        "start": 0,
        "end": 4
    },
    {
        "type": "text",
        "value": "3",
        "isLink": false,
        "href": "3",
        "start": 4,
        "end": 5
    }
]

linkify.test runs tokenize and expects exactly one token with isLink: true.

Hope that helps

rfgamaral commented 9 months ago

@nfrasser Thank you for the quick reply, however, this creates a problem on our side, that I'm not exactly sure how to fix.

This issue was first reported to one of our products in the context of a rich-text editor. A user typed 1.nf3, and ended up having the 1.nf string linked, while the 3 at the end was not. From our perspective (and the users', nothing should be linked in this string). Our rich-text editor uses Tiptap, which in turn uses Linkify and the find function to find links. While typing a string such as 1.nf3<space>, when pressing <space>, this is when the find function is executed and finds the link. From a rich-text editor perspective, typing something like 1.nf3 shouldn't auto link the string because .nf3 is not a valid TLD, which the test function seems to be aware of.

Do you have any suggestions on how to approach "fixing" or improving this behaviour based on the rationale explained above? The auto linking mechanism in Tiptap provides a validate function before doing the actual auto linking in the editor, I could use that and call linkify.test(str) to validate the link based on the test function. Would that make sense, or do you think that could cause issues down the line?

nfrasser commented 9 months ago

@rfgamaral thanks for the additional explanation, I can definitely empathize with the expected behaviour here. Unfortunately I don't have an easy fix for this. Linkify's parser implementation has to support cases where there are no spaces around the URL. A fix for this would likely mean breaking other valid cases. It comes down to the inherent ambiguity of mixing plain-text with machine-readable strings.

Related issues to showcase the kinds of challenges I have to account for:

351
360
446

If you haven't already, I would suggest reporting this issue with Tiptap. Perhaps they can tweak their editor's implementation to avoid detecting links in the string immediately before the <space> when ~~only~~ more than one token is detected.

Another potential option: Try linkify v3 instead of v4, which has a slightly different parsing implementation. The drawback there is that since it's an older version, it might have other bugs or missing features.

Edit: Typo fix

rfgamaral commented 9 months ago

If you haven't already, I would suggest reporting this issue with Tiptap.

I'm prepared to tackle this myself and open a PR against Tiptap, I'm just trying to understand what would be the best solution.

Perhaps they can tweak their editor's implementation to avoid detecting links in the string immediately before the <space> when only one token is detected.

I think that might effectively disable auto-linking for things like example.com, which is not the intended behaviour. Instead, I was thinking that I could use tokenize (instead of find, which is what Tiptap currently uses), and add some logic to validate links, perhaps comparing the end and start positions of multiple tokens, or, even better, do the same validation that test already does:

linkify.test runs tokenize and expects exactly one token with isLink: true.

My only concern is that docs mention that tokenize is an internal function, so I'm not sure if I should be using that directly. Right?

However, this feels like the type of link validation that I would expect to exist in a rich-text editor with auto-linking functionality. So, to summarize, I could open a PR in Tiptap to use tokenize instead of find and do the same validation that test does, or, alternatively, use the validation that Tiptap provides to call test to validate the link. The advantage of the latter would be that I could push this change to our users immediately, without waiting for PR approval on the Tiptap side (and they could not agree with the solution), and a new version to be published. Unless, there's some downside, I should be aware?

rfgamaral commented 9 months ago

FYI, I ended up opening a PR on Tiptap replacing find with tokenize, and do a similar check as test (ref).

razorshaman909 commented 6 months ago

Similar situation here:

linkify.test("https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/")

returns as false, whereas

linkify.find("https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/") returns

[
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 0,
        "end": 28
    },
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 32,
        "end": 60
    },
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 65,
        "end": 93
    },
    {
        "type": "url",
        "value": "https://linkify.js.org/docs/",
        "isLink": true,
        "href": "https://linkify.js.org/docs/",
        "start": 97,
        "end": 125
    }
]

nfrasser / linkifyjs

`test()` and `find()` do not find the same strings as valid/invalid links #472

351

360

446