Open colemujadzic opened 4 years ago
Hi! Were you able to solve this problem?
possibly, might help us, @joepie91.
Thanks!
Just commenting that I ran into this error as well. I wasn't able to come up with a great general solution but, in my case, I only cared about the country field so I took out the address regex (which was the duplicated part that was causing catastrophic backtracking). That did not affect the rest of the regex because the (?:.+\n)*?
component then was able to capture the address lines. This didn't seem to be limited to just Line 376 though for me, it also affected two other lines: https://github.com/joepie91/python-whois/blob/7b0ddf755b3d706860d5d8cb80c598fd854a48ca/pythonwhois/parse.py#L375-L377
Hi,
@kilgoretrout1985 made a fix for this issue : He merged the 3 regexp into a new one, and fixed the part causing the infinite loop : https://github.com/kilgoretrout1985/pythonwhois-alt/blob/cb948cb1c658d4f8d8fefaa41e7c4a3cc776a037/pythonwhois/parse.py#L376-L390
We are facing issues with getting information for institutdegenech.fr the domain using the domain name. We observed multiple similar issues in the repo with different domains. On inspecting the library further seems to be an issue with the regex used to parse the data. Can you please fix this issue? If not can please provide other alternatives which could be used to fix the issue?
Also as we see above a solution is merged, but is it working for python version 3.9 and above???
@hardik-crest the PR was merged on a different project, pythonwhois-alt .
I recommend you to use this package instead of pythonwhois
(this repo seems abandoned...no update since 2014)
Thanks @Augustin-FL
Hello!
While I understand this project may no longer be maintained (based on the latest commit being over six years old, etc.), because of the potentiaI for this issue to negatively affect production applications, I figured I'd create this to bring attention to / warn others it might impact
Description:
At some point in the course of parsing a WHOIS record (provided by the AFNIC WHOIS server) associated with a french domain (using the '.fr' TLD), it appears the library attempts to match the entire record string against this regular expression:
/nic-hdl:\s*(?P<handle>.+)\ntype:\s*(?P<type>.+)\ncontact:\s*(?P<name>.+)\n(?:.+\n)*?(?:address:\s*(?P<street1>.+)\n)?(?:address:\s*(?P<street2>.+)\n)?(?:address:\s*(?P<street3>.+)\n)?(?:phone:\s*(?P<phone>.+)\n)?(?:fax-no:\s*(?P<fax>.+)\n)?(?:.+\n)*?(?:e-mail:\s*(?P<email>.+)\n)?(?:.+\n)*?changed:\s*(?P<changedate>[0-9]{2}\/[0-9]{2}\/[0-9]{4}).*/
I think it's this one: https://github.com/joepie91/python-whois/blob/7b0ddf755b3d706860d5d8cb80c598fd854a48ca/pythonwhois/parse.py#L376
This evaluation results in catastrophic backtracking and never recovers, causing the application to hang and CPU usage to increase dramatically. Off hand -- records provided by AFNIC seem to have multiple repeated fields like 'ADDRESS' and 'TROUBLE', so it's possible that's where the evaluation is getting tripped up.
Reproduction Steps:
Clone the repository, or install the package via
pip
. I usedvirtualenv
and created a sample environment to test this in. I'm also usingpython 2.7.10
.Use the included
pwhois
script (or a provided method likepythonwhois.get_whois(domain)
) to run the lookup against a.fr
domain like 'afnic.fr', e.g.pwhois afnic.fr
. The process should hang and CPU usage should rapidly increase. I can provide a proof of concept via an online regular expression evaluator if that would be helpful!If this description is at all unclear or if you would like me to provide additional information, just let me know!
Thanks!