robinst / autolink-java

Java library to extract links (URLs, email addresses) from plain text; fast, small and smart
MIT License
207 stars 40 forks source link

add non-breaking space to stop symbols list #14

Closed otopba closed 7 years ago

otopba commented 7 years ago

non-breaking space symbol is not allowed in url

robinst commented 7 years ago

Hey, thanks for the PR!

Do you have a reference for why the U+A0 NO-BREAK SPACE is not allowed in URLs? I looked at RFC 3987 and it explicitly allows the character (search for %xA0). Having said that, I think it makes sense to exclude whitespace characters in this library. Maybe we should list some others as well.

otopba commented 7 years ago

Hi, @robinst !

image link

There can not be any of the types of spaces in the url. This is not allowed. RFC 3987 is indicated that the use of this symbol is allowed in ucschar. I do not know what is ucschar = /

Also, this link can be useful: wikipedia

robinst commented 7 years ago

Thanks, merged. I decided I'll add some other Unicode whitespace characters as well, see the list here: https://en.wikipedia.org/wiki/Whitespace_character#Unicode

robinst commented 7 years ago

Released in 0.7.0: https://github.com/robinst/autolink-java/blob/master/CHANGELOG.md#070---2017-08-31

otopba commented 7 years ago

@robinst Thank you!