Bugfix for issue #41 - Githubissues

File "myproject.py", line 76, in extract
    urls = extractor.find_urls(clearMalformed(text), only_unique=True)
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 682, in find_urls
    urls = OrderedDict.fromkeys(urls) if only_unique else urls
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 645, in gen_urls
    tmp_url = self._complete_url(text, offset + tld_pos, tld)
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 413, in _complete_url
    if not self._is_domain_valid(complete_url, tld):
  File "/home/xyele/.local/lib/python3.8/site-packages/urlextract/urlextract_core.py", line 530, in _is_domain_valid
    host_parts = host.split('.')
AttributeError: 'IPv4Address' object has no attribute 'split'

So I changed it to host_parts = str(host).split('.') and fixed the issue. Note: I didn't review all the code, so I think there might a better solution.

lipoja / URLExtract

Bugfix for issue #41 #70