Hi, ever since 1.7.0 -- in particular it looks like #135, some URLs cause an error since the authority is None:
on 1.6.0:
In [1]: text = '[[ "$(giturl)" =~ ^https://gitlab.com ]] echo "found" || echo "didnt'
In [2]: import urlextract
In [3]: u = urlextract.URLExtract()
In [4]: list(u.gen_urls(text))
Out[4]: []
(I am not talking about this not finding the URL, just about this throwing an error)
on 1.7.0:
In [1]: text = '[[ "$(giturl)" =~ ^https://gitlab.com ]] echo "found" || echo "didnt'
In [2]: import urlextract
...:
In [3]: u = urlextract.URLExtract()
...:
In [4]: list(u.gen_urls(text))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [4], line 1
----> 1 list(u.gen_urls(text))
File ~/.local/lib/python3.10/site-packages/urlextract/urlextract_core.py:792, in URLExtract.gen_urls(self, text, check_dns, get_indices, with_schema_only)
790 validated = self._validate_tld_match(text, tld, offset + tld_pos)
791 if tld_pos != -1 and validated:
--> 792 tmp_url = self._complete_url(
793 text,
794 offset + tld_pos,
795 tld,
796 check_dns=check_dns,
797 with_schema_only=with_schema_only,
798 )
800 if tmp_url:
801 # do not search for TLD in already extracted URL
802 tld_pos_url = self._get_tld_pos(tmp_url, tld)
File ~/.local/lib/python3.10/site-packages/urlextract/urlextract_core.py:494, in URLExtract._complete_url(self, text, tld_pos, tld, check_dns, with_schema_only)
492 if complete_url.startswith(("-", ".", "~", "_")):
493 complete_url = complete_url[1:]
--> 494 if not self._is_domain_valid(
495 complete_url, tld, check_dns=check_dns, with_schema_only=with_schema_only
496 ):
497 return ""
499 return complete_url
File ~/.local/lib/python3.10/site-packages/urlextract/urlextract_core.py:581, in URLExtract._is_domain_valid(self, url, tld, check_dns, with_schema_only)
577 url_parts = uritools.urisplit(url)
578 # <scheme>://<authority>/<path>?<query>#<fragment>
579
580 # authority can't start with @
--> 581 if url_parts.authority.startswith('@'):
582 return False
584 # if URI contains user info and schema was automatically added
585 # the url is probably an email
AttributeError: 'NoneType' object has no attribute 'startswith'
I believe would need to add a check for url_parts.authority to check if its None before checking for @?
Hi, ever since 1.7.0 -- in particular it looks like #135, some URLs cause an error since the authority is None:
on 1.6.0:
(I am not talking about this not finding the URL, just about this throwing an error)
on 1.7.0:
I believe would need to add a check for
url_parts.authority
to check if its None before checking for@
?