lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
MIT License
242 stars 61 forks source link

Bugfix for issue #41 #70

Closed ghost closed 4 years ago

ghost commented 4 years ago
File "myproject.py", line 76, in extract
    urls = extractor.find_urls(clearMalformed(text), only_unique=True)
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 682, in find_urls
    urls = OrderedDict.fromkeys(urls) if only_unique else urls
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 645, in gen_urls
    tmp_url = self._complete_url(text, offset + tld_pos, tld)
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 413, in _complete_url
    if not self._is_domain_valid(complete_url, tld):
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 530, in _is_domain_valid
    host_parts = host.split('.')
AttributeError: 'IPv4Address' object has no attribute 'split'

So I changed it to host_parts = str(host).split('.') and fixed the issue. Note: I didn't review all the code, so I think there might a better solution.

ghost commented 4 years ago

Oh, sorry! I realized my version of urlextract was outdated. So I updated it and it is already fixed. Thanks.