Extracting Markdown Text, doesn't process escaped \\ correctly

lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

MIT License

241 stars 61 forks source link

Extracting Markdown Text, doesn't process escaped \\ correctly #152

Closed kevintxu closed 5 months ago

kevintxu commented 1 year ago

Code

import urlextract
test_str = "[markdown link name](https://google.com/)\\n\\n![Img](https://a.image-site.com/f/1/970x630/aaa/some_image_970x630px.jpg)"
extractor = urlextract.URLExtract()
extractor.find_urls(test_str)

Actual output

['https://google.com/)\\n\\n![Img](https://a.image-site.com/f/1/970x630/aaa/some_image_970x630px.jpg']

Expected output

['https://google.com/', 'https://a.image-site.com/f/1/970x630/aaa/some_image_970x630px.jpg']

lipoja commented 5 months ago

Should be fixed in next release