markdown-it / linkify-it

Links recognition library with full unicode support
http://markdown-it.github.io/linkify-it/
MIT License
655 stars 63 forks source link

URL with underscore is not detected [reopen] #68

Closed lxgreen closed 5 years ago

lxgreen commented 6 years ago

Sorry for inconvenience, just checked with my client. They have this subdomain name: https://rfinton2_18.gr8.com/, and it is not detected by linkify

puzrin commented 6 years ago

https://stackoverflow.com/questions/2180465/can-domain-name-subdomains-have-an-underscore-in-it

That's invalid hostname. I'm for ignore such case and do nothing.

puzrin commented 5 years ago

Seems no need to fix - no activity.

eramdam commented 4 years ago

@puzrin hey, it seems that the SO thread you linked has had an update since then

https://stackoverflow.com/a/2183140/6398000

Most answers given here are false. It is perfectly legal to have an underscore in a domain name.

I'm running into this exact issue as well. Is there any hope to see this fixed? (Happy to open a PR if you have some pointers of where the change should be made) Ended up opening a PR https://github.com/markdown-it/linkify-it/pull/86

puzrin commented 4 years ago

The answer you refered says, underscores are ok for domain names, but not ok for hostnames. And URL-s uses hostnames (=> should be without underscores). What is wrong?

eramdam commented 4 years ago

@puzrin the issue is that, like the few people who opened this issue and the other similar one, I come accross URLs that are functional and have an underscore in their "domain" part (before the path).

Examples like the one in this comment + the one I encountered (that I cannot share because it's private to the customer of the company I work for).

puzrin commented 4 years ago

That example was artificial, made by user himself. That's not correct. I need something more close to real world.

eramdam commented 4 years ago

The URL I had was https://company_ops.frontapp.com/path (real company name removed for privacy). The underscore is in the sub-domain and not the domain itself, so maybe a cleaner solution would be to allow underscores in subdomains but not in domains?

puzrin commented 4 years ago

To be honest, it's suspicious, when users ask to fix situations, caused by themself. I'd suggest to fix your domain for sure instead. Or you could monkeypatch regexp exports before linkifier use - this will work and not requires upstream changes.

eramdam commented 4 years ago

To be honest, it's suspicious, when users ask to fix situations, caused by themself. I'd suggest to fix your domain for sure instead.

What is there to fix? The URL leads to a working page, I don't see how it's controversial to support it, especially when other linkifiers out there (like Github's one) support that case.

Or you could monkeypatch regexp exports before linkifier use - this will work and not requires upstream changes.

So I tried doing that (from a code that uses markdown-it, mind you) and I haven't had any success because (I assume) the regexes are compiled at runtime and I can't re-compile them from the outside of the module.

puzrin commented 4 years ago

What is there to fix? The URL leads to a working page, so it's valid, I don't see how it's controversial to support it, especially when other linkifiers out there (like Github's one) support that case.

I have a different opinion about this. Any change should not intruduce additional risks and should be useful for all. IMO you wish to push your custom request to mainstream - nobody else need it.

So I tried doing that (from a code that uses markdown-it, mind you) and I haven't had any success because (I assume) the regexes are compiled at runtime and I can't re-compile them from the outside of the module.

I don't remember details. But initially instance was expected to be patch-able. If something missed - this can be fixed.

eramdam commented 4 years ago

nobody else need it.

Given that this is the second issue about this, I would argue otherwise but I'll stop here because we're not going anywhere.

So I tried doing that (from a code that uses markdown-it, mind you) and I haven't had any success because (I assume) the regexes are compiled at runtime and I can't re-compile them from the outside of the module.

I don't remember details. But initially instance was expected to be patch-able. If something missed - this can be fixed.

Here's an example of what I've been doing https://runkit.com/eramdam/5e94eaefe02882001a0d2cd7, applying my (albeit dirty) working fix from https://github.com/markdown-it/linkify-it/pull/86 and it doesn't seem to work. So I must be missing a compile step somewhere to make it so markdownIt.linkify takes the changes into account.

puzrin commented 4 years ago
  1. See https://github.com/markdown-it/linkify-it/blob/master/index.js#L158 you can override .onCompile() and patch there loca copies of regexps
  2. If use in markdown-it - reassign new instace of likifier, or patch it before md instance create)
eramdam commented 4 years ago
1. See https://github.com/markdown-it/linkify-it/blob/master/index.js#L158 you can override `.onCompile()` and patch there loca copies of regexps

2. If use in markdown-it - reassign new instace of likifier, or patch it before md instance create)

All of this makes sense but... It doesn't seem to work 😕 Or maybe I'm not doing this correctly 😓

Here's a gist with what I've been doing https://gist.github.com/eramdam/4792b9eb7e46c0abc08c7e18826ba5ad#file-test-monkeypatch-js

magicode commented 3 years ago

example https://a_a_a_a_a.wix.com/

@puzrin chrome, edge and github markdown support underscore in hostname
markdown-it is no supported yet