comma extracted at the end if url ends with comma

amoldavsky commented 2 years ago

This should not be the case:

>>> from urlextract import URLExtract
>>> extractor = URLExtract()
>>> extractor.find_urls("https://www.formpl.us/form/1653896001, work independently from home")

['https://www.formpl.us/form/1653896001,']

controldev commented 2 years ago

The same happens with dots (i.e. '.'), which is a relatively frequent error, for example when sentences end with links.

lipoja commented 2 years ago

@amoldavsky @controldev Hello. Thank you for reporting this issue. I agree that this is not ideal. And I would like to ask you for help in form of discussion because I do not see easy general solution to this problem. What my suggestion would be is postprocessing.

User (in this case you) is the one using this tool. User should know what kind of text is processing. And therefore user can update URLs just by removing extra comma if he expects to be there. It can be done by using simple .rtrim(',').

If you look on this issue in general. I can no easily remove every dot or comma at the end of URL because it might be part of the URL.

However I am open for discussion, maybe you have some solution in mind that we can agree on and implement it.

lipoja commented 5 months ago

Closing this issue since there is no further discussion and simple solution is recommended to user.

lipoja / URLExtract

comma extracted at the end if url ends with comma #123