Indices of found URLs - Githubissues

lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

MIT License

242 stars 61 forks source link

Indices of found URLs #74

Closed BenoitTS closed 4 years ago

BenoitTS commented 4 years ago

Added a new attribute when searching for URLs to return the URLs as well as beginning and ending indices: solve #71.

extractor = URLExtract()
example_text = "Text with URLs. Let's have URL janlipovsky.cz as an example."

for url in extractor.gen_urls(example_text, get_indices=True):
    print(url) # prints: ('janlipovsky.cz', (31, 45))

By default, get_indices=False for code compatibility.

lipoja commented 4 years ago

Thank you for this pull request! I really appreciate it!