nfrasser / linkifyjs

JavaScript plugin for finding links in plain-text and converting them to HTML <a> tags.
https://linkify.js.org
MIT License
1.85k stars 184 forks source link

linkify.tokenize breaks string as url and text on encountering org. #477

Open PratyushGar opened 7 months ago

PratyushGar commented 7 months ago

I have a text: org.org_custom_connector On parsing through tokenize, it breaks it into a URL and text token - url: "org.org" && text: "_custom_connector

[
    {
        "t": "url",
        "v": "org.org",
        "tk": [
            {
                "t": "TLD",
                "v": "org",
                "s": 0,
                "e": 3
            },
            {
                "t": "DOT",
                "v": ".",
                "s": 3,
                "e": 4
            },
            {
                "t": "TLD",
                "v": "org",
                "s": 4,
                "e": 7
            }
        ]
    },
    {
        "t": "text",
        "v": "_custom_connector",
        "tk": [
            {
                "t": "UNDERSCORE",
                "v": "_",
                "s": 7,
                "e": 8
            },
            {
                "t": "DOMAIN",
                "v": "custom",
                "s": 8,
                "e": 14
            },
            {
                "t": "UNDERSCORE",
                "v": "_",
                "s": 14,
                "e": 15
            },
            {
                "t": "DOMAIN",
                "v": "connector",
                "s": 15,
                "e": 24
            }
        ]
    }
] 

Is there any way to override this behavior of not splitting the URL?

artemik commented 1 week ago

Confirm, I see the same issue. For example backend.eu-central1.internal:8080 gets parsed as backend.eu only. These kind of urls are common for cloud dns names. Would be nice to have a fix/option to handle these.