Scanning a domain where homepage has an error fails scan sanity checks

jason-hoerner commented 3 years ago

Describe the bug\ I begun troubleshooting, but the "typo3" CMS is something that's new to me. I found your project when need arose to scan a website that has typo3. In my case, the homepage is an error message, not a typical homepage. Details below but withholding the site URL for certain reasons. Let me know what I can do to help troubleshoot best place this scenario can be accounted for, running a test on my target site or giving you the site privately.

To Reproduce\ Steps to reproduce the behavior:

The script responds by saying the error output from line 77 in file 'typo3scan.py' of "It seems that Typo3 is not used on this domain"
Tracing this back through the code, it looks like the "check_root" function looks for specific typo3 keywords on line 100 of file 'domain.py' in the re.search() method, but since my target site is responding with a 503 error response, it fails this check?
The footer of the page still contains the text that is searched for in the "check_404" method, just not what's in the "check_root" method.

footer text of the page:

TYPO3 is an open source content management system. To maintain the quality of the system and to improve it, please help us by donating.
        TYPO3 CMS. Copyright © 1998-2011 Kasper Skårhøj. Extensions are copyright of their respective owners. Go to http://typo3.com/ for details.
        TYPO3 comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. Obstructing the appearance of this notice is prohibited by law.

whoot commented 3 years ago

If I understood your issue correctly, the tool just says "It seems that Typo3 is not used on this domain", but you know that Typo3 is used, because it is in the footer of the 503 page, right?

I really appreciate that you tried to debug it yourself. The tool checks in various ways if Typo3 is used on the domain.

check_root -> will search for some typical Typo3 strings (powered by TYPO3) on the root page of the URL
check_404 -> requests a random directory to provoke an error message and checks for "TYPO3 CMS" in the response. The HTTP status code doesnt matter, because the HTML text is checked.

If both checks fail, the tools assumes that Typo3 is not used on the domain. Its hard to debug this without any information about the web page and what the responses look like, but I would recommend to simply check what response you get from the webapplication in both methods (just add print(response['html']) and see where and why it fails to recognize the Typo3 strings.

whoot commented 3 years ago

@jason-hoerner Did you manage to fix the issue? And if so, was it a bug in Typo3Scan?

whoot commented 3 years ago

Closed

whoot / Typo3Scan

Scanning a domain where homepage has an error fails scan sanity checks #18