robinst / linkify

Rust library to find links such as URLs and email addresses in plain text, handling surrounding punctuation correctly
https://robinst.github.io/linkify/
Apache License 2.0
206 stars 12 forks source link

non-breaking space is included as part of e-mail links #66

Open hamamo opened 1 year ago

hamamo commented 1 year ago

Apparently "\u{a0}" is considered part of e-mail links, as shown by this failing test case:

#[test]
fn test_link_finder() {
    let text =
        "this is a mail address:\u{a0}test@example.com\u{a0}surrounded by non-breaking spaces";
    let mut links = LinkFinder::new().links(text);
    assert_eq!(links.next().unwrap().as_str(), "test@example.com");
}

As non-breaking spaces are common in e-mail bodies, this leads to misidentification of links.