lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
MIT License
242 stars 61 forks source link

Indexes of found URLs #71

Closed javad94 closed 4 years ago

javad94 commented 4 years ago

Any plan to include indexes of found URLs? Something like: input: "Let's have URL example.com" output: [('example.com', (15, 26))]

lipoja commented 4 years ago

Hi @javad94, I never thought about it. DO you think it be useful for others as well? What is your use case? And do you mean just for find_urls / gen_urls or for the command line output as well?

javad94 commented 4 years ago

Hi @lipoja,

DO you think it be useful for others as well?

Yes, maybe. better be there than not to be.

What is your use case?

My use case involves replacing URLs in a string. take for example this string: Let's have URL http://example.com and http://example.com/testpage and again http://example.com I want to replace URLs with HTML's a tag of that URL. So when I use the below function, it will replace part of the second URL too.

replace('http://example.com', '<a href="http://example.com">link</a>')

output:

Let's have URL <a href="http://example.com">link</a> and <a href="http://example.com">link</a>/testpage and again <a href="http://example.com">link</a>

I want to use those indexes for precise replacing,

And do you mean just for find_urls / gen_urls or for the command line output as well?

Yes I mean only find_urls / gen_urls functions. I don't use the command line feature of this library at all.

BenoitTS commented 4 years ago

+1 That would be very useful so as to filter out URLs from input string.

lipoja commented 4 years ago

Thanks @BenoitTS for implementation. It will be part of next release.

javad94 commented 4 years ago

Thanks @BenoitTS, really appreciate it.