Closed rfgamaral closed 9 months ago
Hi @rfgamaral, thanks for the report. The output of find
has 1.nf
, which is a valid URL. Notice that it excludes the final 3
. test
returns false
because while "1.nf3" does contain a valid URL, the full string is not a URL.
You can see this this clearly when you run linkify.tokenize
:
linkify.tokenize("1.nf3").map(token => token.toObject())
Output:
[
{
"type": "url",
"value": "1.nf",
"isLink": true,
"href": "http://1.nf",
"start": 0,
"end": 4
},
{
"type": "text",
"value": "3",
"isLink": false,
"href": "3",
"start": 4,
"end": 5
}
]
linkify.test
runs tokenize
and expects exactly one token with isLink: true
.
Hope that helps
@nfrasser Thank you for the quick reply, however, this creates a problem on our side, that I'm not exactly sure how to fix.
This issue was first reported to one of our products in the context of a rich-text editor. A user typed 1.nf3
, and ended up having the 1.nf
string linked, while the 3
at the end was not. From our perspective (and the users', nothing should be linked in this string). Our rich-text editor uses Tiptap, which in turn uses Linkify and the find
function to find links. While typing a string such as 1.nf3<space>
, when pressing <space>
, this is when the find
function is executed and finds the link. From a rich-text editor perspective, typing something like 1.nf3
shouldn't auto link the string because .nf3
is not a valid TLD, which the test
function seems to be aware of.
Do you have any suggestions on how to approach "fixing" or improving this behaviour based on the rationale explained above? The auto linking mechanism in Tiptap provides a validate
function before doing the actual auto linking in the editor, I could use that and call linkify.test(str)
to validate the link based on the test
function. Would that make sense, or do you think that could cause issues down the line?
@rfgamaral thanks for the additional explanation, I can definitely empathize with the expected behaviour here. Unfortunately I don't have an easy fix for this. Linkify's parser implementation has to support cases where there are no spaces around the URL. A fix for this would likely mean breaking other valid cases. It comes down to the inherent ambiguity of mixing plain-text with machine-readable strings.
Related issues to showcase the kinds of challenges I have to account for:
If you haven't already, I would suggest reporting this issue with Tiptap. Perhaps they can tweak their editor's implementation to avoid detecting links in the string immediately before the <space>
when only more than one token is detected.
Another potential option: Try linkify v3 instead of v4, which has a slightly different parsing implementation. The drawback there is that since it's an older version, it might have other bugs or missing features.
Edit: Typo fix
If you haven't already, I would suggest reporting this issue with Tiptap.
I'm prepared to tackle this myself and open a PR against Tiptap, I'm just trying to understand what would be the best solution.
Perhaps they can tweak their editor's implementation to avoid detecting links in the string immediately before the
<space>
when only one token is detected.
I think that might effectively disable auto-linking for things like example.com
, which is not the intended behaviour. Instead, I was thinking that I could use tokenize
(instead of find
, which is what Tiptap currently uses), and add some logic to validate links, perhaps comparing the end
and start
positions of multiple tokens, or, even better, do the same validation that test
already does:
linkify.test
runstokenize
and expects exactly one token withisLink: true
.
My only concern is that docs mention that tokenize
is an internal function, so I'm not sure if I should be using that directly. Right?
However, this feels like the type of link validation that I would expect to exist in a rich-text editor with auto-linking functionality. So, to summarize, I could open a PR in Tiptap to use tokenize
instead of find
and do the same validation that test
does, or, alternatively, use the validation
that Tiptap provides to call test
to validate the link. The advantage of the latter would be that I could push this change to our users immediately, without waiting for PR approval on the Tiptap side (and they could not agree with the solution), and a new version to be published. Unless, there's some downside, I should be aware?
FYI, I ended up opening a PR on Tiptap replacing find
with tokenize
, and do a similar check as test
(ref).
Similar situation here:
linkify.test("https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/")
returns as false, whereas
linkify.find("https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/ https://linkify.js.org/docs/")
returns
[
{
"type": "url",
"value": "https://linkify.js.org/docs/",
"isLink": true,
"href": "https://linkify.js.org/docs/",
"start": 0,
"end": 28
},
{
"type": "url",
"value": "https://linkify.js.org/docs/",
"isLink": true,
"href": "https://linkify.js.org/docs/",
"start": 32,
"end": 60
},
{
"type": "url",
"value": "https://linkify.js.org/docs/",
"isLink": true,
"href": "https://linkify.js.org/docs/",
"start": 65,
"end": 93
},
{
"type": "url",
"value": "https://linkify.js.org/docs/",
"isLink": true,
"href": "https://linkify.js.org/docs/",
"start": 97,
"end": 125
}
]
Hello there 👋.
Going straight to the point:
find()
Output:
test()
Output:
I understand that Linkify is not 100% spec compliant, but shouldn't
test()
andfind()
return the same output? More importantly,linkify.find('1.nf3')
should return[]
because.nf3
is not a valid TLD while.nf
is (ref).Is this a bug, or is this somehow intended?