mvdan / xurls

Extract urls from text
BSD 3-Clause "New" or "Revised" License
1.19k stars 116 forks source link

Added fix for non ASCII TLDs #33

Closed bynov closed 5 years ago

bynov commented 5 years ago

This is a simple fix so we could start from here.

Benchmarks:

benchmark                    old ns/op     new ns/op     delta
BenchmarkStrictEmpty-4       13.7          14.0          +2.19%
BenchmarkStrictSingle-4      25797         28526         +10.58%
BenchmarkStrictMany-4        81277         82469         +1.47%
BenchmarkRelaxedEmpty-4      9566          15405         +61.04%
BenchmarkRelaxedSingle-4     61348         100386        +63.63%
BenchmarkRelaxedMany-4       164582        277470        +68.59%

We can see here that it becomes more expensive, we need to think about it.

Regarding #32

mvdan commented 5 years ago

I'd also like to add - this is a complex issue to solve, so please understand that this isn't a simple merge.

mvdan commented 5 years ago

I've ended up going for a simpler fix: removing the feature entirely in 15684420e2b89d5320d909e8f05655c101ab18bc. See the commit message for details. Thanks again!