lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
MIT License
242 stars 61 forks source link

pypidb issues #68

Open jayvdb opened 4 years ago

jayvdb commented 4 years ago

Continuing from #63 , these are the known issues (list will grow).

As a general rule, the higher priority issues are where urlextract doesnt extract valuable urls, or extracts truncated urls. Returning extra junk around urls or extra urls is problematic, but I can trim/remove junk. I cant fix data I dont have.

Others I think are harder and may not be in urlextract scope: