Hi, while trying to use the URLEextract() in function to parse a dataframe column, it runs really slow.
Here is my code:
def extract_urls(last):
extractor = URLExtract()
count = 0
for text in lst:
urls_found = extractor.find_urls(text)
if len(urls_found) > 0 and MY_URL in urls_found:
count += len(urls_found)
return count
df['col2'] = df['col1'].apply(extract_url)
It takes a long time due to the loading time of the TLDs and the FileLocks.
Maybe you shall convert this object to Singleton?
Another idea is to load the TLDs just once by converting the TLDs object to Singleton.
Hi, while trying to use the
URLEextract()
in function to parse a dataframe column, it runs really slow.Here is my code:
It takes a long time due to the loading time of the TLDs and the FileLocks.
Maybe you shall convert this object to Singleton?
Another idea is to load the TLDs just once by converting the TLDs object to Singleton.