urbanadventurer / WhatWeb

Next generation web scanner
https://www.morningstarsecurity.com/research/whatweb
GNU General Public License v2.0
5.19k stars 885 forks source link

Updates to typo3 plugin and minor tweak to Telerik plugin name #364

Closed definity closed 3 years ago

definity commented 3 years ago

Corrected the Telerik plugin name to use a hyphen instead of an underscore in the name.. seems hyphens are more common

Big improvements to typo3 detection

I think the passive detection could be improved as there are other plugins that pick up TYPO3 pretty good, but I didn't have time to figure out how to check those. MetaGenerator[TYPO3 CMS], PoweredBy[TYPO3]

The new HTML comment could be improved with a regex... I've seen the <-- on a its own line followed by the text...

urbanadventurer commented 3 years ago

It looks great and thanks for all the hashes for the JavaScript versions!

I have some suggestions of how to improve the HTML powered by detection.

I noted you removed the HTML comment prefix with the comment "removed html comment from text match since it could be split up by newlines.

The first version was { :text=>'<!-- This website is powered by TYPO3', :certainty=>75 },

and the second version was :text=>'This website is powered by TYPO3', :certainty=>75 },

My suggestions are that the :certainty=>75 attribute isn't necessary and can be removed.

If the newlines are in predictable places you can use a regular expression like this. the \W+ means one or more white-spaces, so that can be a space or a newline. With a regular expression you can keep the HTML comment prefix too and it won't be as likely to have a false positive detection.

/<!--\W+This website is powered by TYPO3/

The regex version would be this: {:name=>"Powered by HTML comment", :regexp=>/<!--\W+This website is powered by TYPO3/ },

I haven't tested this on any TYPO3 websites so I don't know if that would work or that's where the newline break is found.

definity commented 3 years ago

Thanks for the feedback! I'll make that change and commit, hopefully today.

The TYPO3 site where I found the line break looked like this

<html lang="de">
<head>

<meta charset="utf-8">
<!--
        This website is powered by TYPO3 - inspiring people to share!
        TYPO3 is a free open source Content Management Framework initially created by Kasper Skaarhoj and licensed under GNU/GPL.
        TYPO3 is copyright 1998-2018 of Kasper Skaarhoj. Extensions are copyright of their respective owners.
        Information and contribution at https://typo3.org/
-->
urbanadventurer commented 3 years ago

Cool. That would work with the regex I provided.

definity commented 3 years ago

Also that multi-line comment about TYPO3 happened to come from a site running version 8.7 and didn't have a MD5 hit for any of the files. Which lead me to realize that I can download older versions like v8.x . I'll make a few more commits to this pull request to cover older versions.

definity commented 3 years ago

Okay, I'm done for now. Let me know if you need to fix anything else.

urbanadventurer commented 3 years ago

Thanks @definity that looks great! 🔥