joepie91 / python-whois

A python module for retrieving and parsing WHOIS data
Do What The F*ck You Want To Public License
399 stars 187 forks source link

Parse error on UTF8 domain-reports #118

Open ppKrauss opened 8 years ago

ppKrauss commented 8 years ago

Example pwhois -j terra.com.br (or pwhois -r terra.com.br)

Traceback (most recent call last):
  File "/usr/local/bin/pwhois", line 23, in <module>
    data, server_list = pythonwhois.net.get_whois_raw(args.domain[0], with_server_list=True)
  File "/usr/local/lib/python2.7/dist-packages/pythonwhois/net.py", line 44, in get_whois_raw
    response = whois_request(request_domain, target_server)
  File "/usr/local/lib/python2.7/dist-packages/pythonwhois/net.py", line 94, in whois_request
    return buff.decode("utf-8")
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 1278: invalid continuation byte

PS: extrange, because the example is a "very well behaved" domain-report:

% Copyright (c) Nic.br
%  The use of the data below is only permitted as described in
%  full by the terms of use at http://registro.br/termo/en.html ,
%  being prohibited its distribution, commercialization or
%  reproduction, in particular, to use it for advertising or
%  any similar purpose.
%  2016-04-30 08:58:04 (BRT -03:00)

domain:      terra.com.br
owner:       Terra Networks Brasil S.A.
ownerid:     091.088.328/0001-67
responsible: Hostmaster Terra Networks
country:     BR
owner-c:     MPL4
admin-c:     MPL4
tech-c:      ALG3
billing-c:   CTN25
nserver:     a.dns.terra.com  
nsstat:      20160430 AA
nslastaa:    20160430
nserver:     b.dns.terra.com.br 200.215.193.1 2001:12c0:0:2151:200:154:46:20
nsstat:      20160430 AA
nslastaa:    20160430
nserver:     c.dns.terra.com  
nsstat:      20160430 AA
nslastaa:    20160430
nserver:     d.dns.terra.com.br 200.215.194.1 2001:12c0:0:2151:200:154:46:21
nsstat:      20160430 AA
nslastaa:    20160430
created:     19981130 #129987
expires:     20171130
changed:     20120203
status:      published

nic-hdl-br:  ALG3
person:      Hostmaster Terra Networks
e-mail:      domain@terra.com.br
created:     19971226
changed:     20160420

nic-hdl-br:  CTN25
person:      Cobrança Terra Networks
e-mail:      idcobranca@terra.com.br
created:     20041103
changed:     20070124

nic-hdl-br:  MPL4
person:      Hostmaster Terra Networks
e-mail:      domain@terra.com.br
created:     19980122
changed:     20141027

% Security and mail abuse issues should also be addressed to
% cert.br, http://www.cert.br/ , respectivelly to cert@cert.br
% and mail-abuse@cert.br
%
% whois.registro.br accepts only direct match queries. Types
% of queries are: domain (.br), registrant (tax ID), ticket,
% provider, contact handle (ID), CIDR block, IP and ASN.