Closed definity closed 3 years ago
It looks great and thanks for all the hashes for the JavaScript versions!
I have some suggestions of how to improve the HTML powered by detection.
I noted you removed the HTML comment prefix with the comment "removed html comment from text match since it could be split up by newlines.
The first version was
{ :text=>'<!-- This website is powered by TYPO3', :certainty=>75 },
and the second version was
:text=>'This website is powered by TYPO3', :certainty=>75 },
My suggestions are that the :certainty=>75
attribute isn't necessary and can be removed.
If the newlines are in predictable places you can use a regular expression like this. the \W+ means one or more white-spaces, so that can be a space or a newline. With a regular expression you can keep the HTML comment prefix too and it won't be as likely to have a false positive detection.
/<!--\W+This website is powered by TYPO3/
The regex version would be this:
{:name=>"Powered by HTML comment", :regexp=>/<!--\W+This website is powered by TYPO3/ },
I haven't tested this on any TYPO3 websites so I don't know if that would work or that's where the newline break is found.
Thanks for the feedback! I'll make that change and commit, hopefully today.
The TYPO3 site where I found the line break looked like this
<html lang="de">
<head>
<meta charset="utf-8">
<!--
This website is powered by TYPO3 - inspiring people to share!
TYPO3 is a free open source Content Management Framework initially created by Kasper Skaarhoj and licensed under GNU/GPL.
TYPO3 is copyright 1998-2018 of Kasper Skaarhoj. Extensions are copyright of their respective owners.
Information and contribution at https://typo3.org/
-->
Cool. That would work with the regex I provided.
Also that multi-line comment about TYPO3 happened to come from a site running version 8.7 and didn't have a MD5 hit for any of the files. Which lead me to realize that I can download older versions like v8.x . I'll make a few more commits to this pull request to cover older versions.
Okay, I'm done for now. Let me know if you need to fix anything else.
Thanks @definity that looks great! 🔥
Corrected the Telerik plugin name to use a hyphen instead of an underscore in the name.. seems hyphens are more common
Big improvements to typo3 detection
I think the passive detection could be improved as there are other plugins that pick up TYPO3 pretty good, but I didn't have time to figure out how to check those.
MetaGenerator[TYPO3 CMS], PoweredBy[TYPO3]
The new HTML comment could be improved with a regex... I've seen the
<--
on a its own line followed by the text...