I am trying to get the registered domain for a large set of URLs. While most resolve fine I have come across some strange issues where the registered domain is incorrectly identified as ''".
For example:
tldextract.extract('www.sienna.it')
resolves correctly to
subdomain='www', domain='sienna', suffix='it'
while
tldextract.extract('www.siena.it')
incorrectly identifies the domain as 'www'
subdomain='', domain='www', suffix='siena.it'
I am using tldextract version 2.2.1
Other URLs for which the registered domain is incorrectly identified as "'' include
I am trying to get the registered domain for a large set of URLs. While most resolve fine I have come across some strange issues where the registered domain is incorrectly identified as ''".
For example:
tldextract.extract('www.sienna.it') resolves correctly to subdomain='www', domain='sienna', suffix='it'
while
tldextract.extract('www.siena.it') incorrectly identifies the domain as 'www' subdomain='', domain='www', suffix='siena.it'
I am using tldextract version 2.2.1
Other URLs for which the registered domain is incorrectly identified as "'' include
http://kh.ua/ http://wi.us/ http://mil.ru/ http://henselin.ner/ http://df.gov.br/ http://pa.gov.br/ http://tas.gov.au/ http://k12.co.us/ http://fe.it/ http://bialowieza.pl/ http://harald-schirmer.d/ http://malbork.pl/ http://www.helios-kliniken/indersdorf.de http://hokkaido.jp/ http://olsztyn.pl/ http://nysa.pl/ http://canned.mehttps/canned.me http://rs.gov.br/ http://es.gov.br/ http://pe.kr/ http://in.us/ http://skoczow.pl/ http://pro.vn/ http://gov.uk/ http://in.ua/ http://vic.edu.au/ http://pb.gov.br/ http://nsw.edu.au/ http://gov.uk/ http://wa.edu.au/ http://ga.us/ http://adv.br/ http://to.gov.br/ http://boleslawiec.pl/ http://www.marienhospital-letmathe/ http://nj.us/ http://lukow.pl/ http://www.bonifatius-apotheke-schnurrer/ http://mi.it/ http://gob.sv/ http://gob.cl/ http://miyazaki.jp/ http://or.kr/ http://gob.ni/ http://govt.nz/ http://harald-schirmer.d/ http://ok.us/ http://bialystok.pl/ http://pp.se/ http://rj.gov.br/ http://turystyka.pl/ http://ct.it/ http://yamagata.jp/ http://govt.nz/ http://auto.pl/ http://liguria.it/ http://km.ua/ http://go.th/ http://pa.us/ http://arq.br/ http://il.us/ http://www.marienhospital-letmathe/ http://go.cr/ http://nom.co/ http://piemonte.it/ http://napoli.it/ http://malbork.pl/ http://wa.edu.au/ http://ilawa.pl/ http://kutno.pl/ http://gen.nz/ http://info.pl/ http://il.us/ http://or.us/ http://cn.it/ http://med.br/ http://gunma.jp/ http://hyogo.jp/ http://jell.e/ http://eco.br/ http://gov.pl/ http://desa.id/ http://vic.gov.au/ http://or.id/ http://sch.id/ http://or.at/ http://or.th/ http://siena.it/ http://verona.it/ http://info.pl/ http://iwate.jp/ http://brescia.it/ http://mn.us/ http://nt.ca/ http://jgora.pl/ http://xhamsterlive.ocom/ http://ck.ua/ http://bydgoszcz.pl/ http://pv.it/ http://pa.it/ http://wroclaw.pl/ http://opoczno.pl/ http://eng.br/ http://school.nz/ http://cieszyn.pl/ http://teusink.ey/ http://https/:On-air.tv http://malbork.pl/ http://on.ca/ http://siena.it/ http://cn.it/ http://uw.gov.pl/ http://beskidy.pl/ http://www.gespag.ata/ http://szkola.pl/