from urlextract import URLExtract
right_stop = url_extract_obj.get_stop_chars_right() | {')'}
left_stop = url_extract_obj.get_stop_chars_left() | {'('}
url_extract_obj.set_stop_chars_right(right_stop)
url_extract_obj.set_stop_chars_left(left_stop)
s = ''(test_string OR url: https://www.russkiymir.ru/) OR (url: https://russkiymir.ru/en/ OR url: https://www.russkiymir.ru/cn/ OR url: https://www.russkiymir.ru/de/ OR url: 4pt.su)''
urls = url_extract_obj.find_urls(s, get_indices=True)
print(urls)
indices = [url[1] for url in urls]
print(indices)
print("")
for index_tuple in indices:
print(s[index_tuple[0]:index_tuple[1]])
Hi, the find_urls() method returns incorrect url indices for the following input - (test_string OR url: https://www.russkiymir.ru/) OR (url: https://russkiymir.ru/en/ OR url: https://www.russkiymir.ru/cn/ OR url: https://www.russkiymir.ru/de/ OR url: 4pt.su) and it also fails to extract one of the urls (https://www.russkiymir.ru/de/)
To reproduce-
Output:
I'm running this with Python 3.6 on Ubuntu 18.04.6.