Email links followed by a hyphen are not detected

markdown-it / linkify-it

Links recognition library with full unicode support

http://markdown-it.github.io/linkify-it/

MIT License

655 stars 63 forks source link

Email links followed by a hyphen are not detected #93

Open mbroshi opened 4 years ago

mbroshi commented 4 years ago

When there is a valid email address, such as name@example.com, if it is followed by a hyphen, linkify.test returns false:

console.log(linkify.test('name@example.com'));   // true
console.log(linkify.test('name@example.com,'));  // true
console.log(linkify.test('name@example.com-'));  // false

I would expect both to be true.

You can test this by adding these two lines to test/fixtures/lines.txt:

name@example.com-
name@example.com

puzrin commented 4 years ago

Please explain use case. Because looks like broken content. Linkifier is expected to be used with valid human-readable texts.

mbroshi commented 4 years ago

This came up in https://github.com/HabitRPG/habitica/issues/12437

Someone has a sentence with an email address--name@example.com--and expects the address to have a link.

Also, every other punctuation I've added as a suffix after the link still tests as true. It's just hyphens that make the text test as false.

mbroshi commented 4 years ago

Looks like GitHub's link parser also has amusing behavior when hyphens are involved :laughing:

puzrin commented 4 years ago

https://markdown-it.github.io/#md3=%7B%22source%22%3A%22address--name%40example.com--and%5Cn%5Cnaddress---name%40example.com---and%5Cn%5Cnaddress--http%3A%2F%2Fexample.com--and%5Cn%5Cnaddress---http%3A%2F%2Fexample.com---and%22%2C%22defaults%22%3A%7B%22html%22%3Afalse%2C%22xhtmlOut%22%3Afalse%2C%22breaks%22%3Afalse%2C%22langPrefix%22%3A%22language-%22%2C%22linkify%22%3Atrue%2C%22typographer%22%3Atrue%2C%22_highlight%22%3Atrue%2C%22_strict%22%3Afalse%2C%22_view%22%3A%22html%22%7D%7D

Similar poblem exists for links. I see how heading for emails can be improved, but not sure about tail.

mbroshi commented 4 years ago

It's curious that it's just hyphens. To be honest, I have trouble parsing the regexes, so I can't be of much help. The simplest I idea I have is to strip trailing hyphens (which are not part of TLD/SLDs), if that's a feasible solution.