robinst / autolink-java

Java library to extract links (URLs, email addresses) from plain text; fast, small and smart
MIT License
207 stars 40 forks source link

Stop URL on < or > #7

Closed tsl0922 closed 8 years ago

tsl0922 commented 8 years ago

code to reproduce:

String input = "wow <p>http://test.com</p> such linked";
LinkExtractor linkExtractor = LinkExtractor.builder().build();
Iterable<LinkSpan> links = linkExtractor.extractLinks(input);
String result = Autolink.renderLinks(input, links, (link, text, sb) -> {
    sb.append("<a href=\"");
    sb.append(text, link.getBeginIndex(), link.getEndIndex());
    sb.append("\">");
    sb.append(text, link.getBeginIndex(), link.getEndIndex());
    sb.append("</a>");
});
System.out.println(result);

expect:

wow <p><a href="http://test.com">http://test.com</a></p> such linked

actual:

wow <p><a href="http://test.com</p>">http://test.com</p></a> such linked

This seems to be a bug, do we support the skip_tags feature of rinku?

robinst commented 8 years ago

We don't try to parse HTML, this library is about detecting links in plain text.

Having said that, unescaped < or > is not actually valid in a URL (or IRI) according to RFC 3987 section 2.2, so we shouldn't accept it.

I'll make this issue about that.

robinst commented 8 years ago

Fixed now, will release soon.

robinst commented 8 years ago

Released in 0.5.0 now.