kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.59k stars 459 forks source link

Update the URL regexes matching urls starting with a vulgar www. #1185

Closed lfoppiano closed 1 month ago

lfoppiano commented 1 month ago

We added a new regex that provide an alternative matching for the URLs (I did not dear to modify the body of the existing one)

coveralls commented 1 month ago

Coverage Status

coverage: 40.755% (+0.003%) from 40.752% when pulling 40a1742285ba909c8cc36fdc34cc987cdfda68e0 on update-regexes-urls into d2f0cdca8d0259c9579c04d04fa67171a69a8f38 on master.

kermitt2 commented 1 month ago

It looks good to me !