joepie91 / python-whois

A python module for retrieving and parsing WHOIS data
Do What The F*ck You Want To Public License
398 stars 187 forks source link

raw parser loops forever for specific input #107

Open ustern1 opened 8 years ago

ustern1 commented 8 years ago

I use parser.parse_raw_whois to parse existing WHOIS information from VirusTotal.

However, for the domain wddj.jp the function loops forever (see that attached raw data): whois_wddj.jp.txt

I use it as follows: pythonwhois.parse.parse_raw_whois([open("whois_wddj.jp.txt","rb").read()])

When using pythonwhois.get_whois directly this works as planned, and I believe the difference is with the existence of the "Registrar" line which is missing from the original.

In any case, I think that the parser should never get stuck in an infinte loop for any reason...

ustern1 commented 8 years ago

OK, further narrowing this problem - it seems to lie with one of the registrar regexps, no. 13 to be exact: 'Contact Information:\n[Name]\s(?P.)\n[Email]\s(?P.)\n[Web Page]\s(?P.)\n[Postal code]\s(?P.)\n[Postal Address]\s(?P.)\n(?:\s+(?P.)\n)?(?:\s+(?P.)\n)?[Phone]\s(?P.)\n[Fax]\s(?P.)\n'

This pattern will loop forever when used in re.search with the following input: u'Domain Information:\n[Domain Name] WDDJ.JP\n\n[Name Server] nsas1.firstserver.ne.jp\n[Name Server] nsas2.firstserver.ne.jp\n[Signing Key] \n\n[Created on] 2010/09/17\n[Expires on] 2016/09/30\n[Status] Active\n[Last Updated] 2015/10/01 01:05:13 (JST)\n\nContact Information:\n[Name] Do-reg Whois Guard Service,Firstserver Inc.\n[Email] whoisguard@do-reg.jp\n[Web Page] \n[Postal code] 160-0004\n[Postal Address] 4-29 Yotsuya, Shinjuku-ku,\n Tokyo 160-0004, JAPAN\n[Phone] 03-5919-8283\n[Fax] 03-5919-8311'

(note that it does work with re.match...)

ustern1 commented 8 years ago

More info: If I remove the last '\n' in the regexp it works as expected...