lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
MIT License
241 stars 61 forks source link

fix-multiple-protocols-in-url #120

Closed amoldavsky closed 1 year ago

amoldavsky commented 2 years ago

Fixes #82

the extraction is incorrect for URLs with multiple protocols:

Link:https://www.google.com
job:https://2.ua/YHfw38
lipoja commented 2 years ago

Hi @amoldavsky Thank you for your contribution! I appreciate your time spent on it. Could we discuss my point of view for this issue?

I would prefer to not limit schemes to just list defined by us. Maybe we could use the list of schemes from IANA? Since this is nice feature and I think we should include it in the code we might think about a way how to disable it for other users who are relaying on current way of extracting URLs?

lipoja commented 2 years ago

@amoldavsky Hello, may I ask you if you are working on this PR? Or if you are super busy with life as I am :)

lipoja commented 1 year ago

Thank you for your work! Closing in favor of #146.