Catastrophic backtracking on '.fr' domains

colemujadzic commented 4 years ago

Hello!

While I understand this project may no longer be maintained (based on the latest commit being over six years old, etc.), because of the potentiaI for this issue to negatively affect production applications, I figured I'd create this to bring attention to / warn others it might impact

Description:

At some point in the course of parsing a WHOIS record (provided by the AFNIC WHOIS server) associated with a french domain (using the '.fr' TLD), it appears the library attempts to match the entire record string against this regular expression:

/nic-hdl:\s*(?P<handle>.+)\ntype:\s*(?P<type>.+)\ncontact:\s*(?P<name>.+)\n(?:.+\n)*?(?:address:\s*(?P<street1>.+)\n)?(?:address:\s*(?P<street2>.+)\n)?(?:address:\s*(?P<street3>.+)\n)?(?:phone:\s*(?P<phone>.+)\n)?(?:fax-no:\s*(?P<fax>.+)\n)?(?:.+\n)*?(?:e-mail:\s*(?P<email>.+)\n)?(?:.+\n)*?changed:\s*(?P<changedate>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).*/

I think it's this one: https://github.com/joepie91/python-whois/blob/7b0ddf755b3d706860d5d8cb80c598fd854a48ca/pythonwhois/parse.py#L376

This evaluation results in catastrophic backtracking and never recovers, causing the application to hang and CPU usage to increase dramatically. Off hand -- records provided by AFNIC seem to have multiple repeated fields like 'ADDRESS' and 'TROUBLE', so it's possible that's where the evaluation is getting tripped up.

Reproduction Steps:

Clone the repository, or install the package via pip. I used virtualenv and created a sample environment to test this in. I'm also using python 2.7.10.
Use the included pwhois script (or a provided method like pythonwhois.get_whois(domain)) to run the lookup against a .fr domain like 'afnic.fr', e.g. pwhois afnic.fr. The process should hang and CPU usage should rapidly increase. I can provide a proof of concept via an online regular expression evaluator if that would be helpful!

If this description is at all unclear or if you would like me to provide additional information, just let me know!

Thanks!

AlexG101010 commented 2 years ago

Hi! Were you able to solve this problem?

AlexG101010 commented 2 years ago

possibly, might help us, @joepie91.

Thanks!

geohci commented 2 years ago

Just commenting that I ran into this error as well. I wasn't able to come up with a great general solution but, in my case, I only cared about the country field so I took out the address regex (which was the duplicated part that was causing catastrophic backtracking). That did not affect the rest of the regex because the (?:.+\n)*? component then was able to capture the address lines. This didn't seem to be limited to just Line 376 though for me, it also affected two other lines: https://github.com/joepie91/python-whois/blob/7b0ddf755b3d706860d5d8cb80c598fd854a48ca/pythonwhois/parse.py#L375-L377

Augustin-FL commented 2 years ago

Hi,

@kilgoretrout1985 made a fix for this issue : He merged the 3 regexp into a new one, and fixed the part causing the infinite loop : https://github.com/kilgoretrout1985/pythonwhois-alt/blob/cb948cb1c658d4f8d8fefaa41e7c4a3cc776a037/pythonwhois/parse.py#L376-L390

hardik-crest commented 2 years ago

We are facing issues with getting information for institutdegenech.fr the domain using the domain name. We observed multiple similar issues in the repo with different domains. On inspecting the library further seems to be an issue with the regex used to parse the data. Can you please fix this issue? If not can please provide other alternatives which could be used to fix the issue?

Also as we see above a solution is merged, but is it working for python version 3.9 and above???

Augustin-FL commented 2 years ago

@hardik-crest the PR was merged on a different project, pythonwhois-alt .

I recommend you to use this package instead of pythonwhois (this repo seems abandoned...no update since 2014)

hardik-crest commented 2 years ago

Thanks @Augustin-FL

joepie91 / python-whois

Catastrophic backtracking on '.fr' domains #156