robinst / autolink-java

Java library to extract links (URLs, email addresses) from plain text; fast, small and smart
MIT License
207 stars 40 forks source link

URL having consecutive "https://https://" are parsed as it is #26

Closed vinay1984 closed 6 years ago

vinay1984 commented 6 years ago

Hi,

URL having consecutive "https://https://" are parsed as it is. can we exclude "https://". unit test case is failing for this URL.

assertLinked("https://https://abc.com/","|https://abc.com/|"); assertLinked("http://http://abc.com/","|http://abc.com/|"); assertLinked("ftp://ftp://abc.com/","|ftp://abc.com/|");

Thx Vin

robinst commented 6 years ago

can we exclude "https://"

No, this library doesn't modify the text, all we do is recognize links.

Note that GitHub's autolinker also recognizes the full text as a link.

If your input has weird URLs like that, you can add some code that takes the links from autolink-java and replaces "https://https://" with "https://". But I don't think it's a common enough problem to be part of this library.