springload / madewithwagtail

A showcase of sites and apps made with Wagtail CMS, the easy to use, open source Django content management system
http://madewithwagtail.org
MIT License
84 stars 21 forks source link

Use consistent heuristics to tell if a site is made with Wagtail #62

Closed thibaudcolas closed 3 years ago

thibaudcolas commented 7 years ago

I would like to build something like http://isthissitebuiltwithdrupal.com/ or http://whatcms.org/, both to make it easier for us to validate sites, and also because it's cool.

A note: if someone wants to tackle this, it is related to Made with Wagtail but I don't think it should be built within it. MWW could potentially integrate with that thing though.

At the moment we use https://wappalyzer.com/, which detects Django / Python, but not with a great degree of confidence IIRC.

Here are the heuristics I could think of, in no particular order:

zerolab commented 7 years ago

Drupal sets a X-Generator header with the major version (i.e. Drupal 8 (https://drupal.org)), and the default expires header is Expires:Sun, 19 Nov 1978 05:00:00 GMT which is the creator's birthday. So this could be one way Wagtail can do it.

thibaudcolas commented 7 years ago

👍 Good approach, but I'm very keen to see how far this can go without changes in Wagtail itself first.

Another hint from Wagtail Slack:

loicteixeira commented 7 years ago

Although it's not a default, we could ping the API at /api/v1/pages.

I'm not a big fan of the generator header or meta, especially not with the version number since it gives attacker too much knowledge of the architecture and the potential open vulnerabilities.

jjanssen commented 7 years ago

Load /admin/login and check for "wagtail" in the response (either in the page title, or wagtailadmin JS file – that's a 100% tell I think).

This is not always the case. We use admin for serving the django admin and cms for serving Wagtail.

thibaudcolas commented 7 years ago

I didn't realise Wappalyzer was open-source! Here is what they make available to detect technologies: https://github.com/AliasIO/Wappalyzer/wiki/Specification

Here is Django for example:

"Django": {
    "cats": [
        "18"
    ],
    "env": "^__admin_media_prefix__",
    "html": "(?:powered by <a[^>]+>Django ?([\\d.]+)?|<input[^>]*name=[\"']csrfmiddlewaretoken[\"'][^>]*>)\\;version:\\1",
    "icon": "Django.png",
    "implies": "Python",
    "website": "http://djangoproject.com"
},

And here is a survey of what other CMSes are doing:

generator is actually standardised (at https://www.w3.org/TR/html5/document-metadata.html#standard-metadata-names): The value must be a free-form string that identifies one of the software packages used to generate the document..

Considering how much info is available anyway, not adding it for the sake of security just feels like obscurity. If someone (malicious or not) does want to do more rigorous fingerprinting, that won't make a big difference. That said some obscurity does sound better than less obscurity 😅.

I'm not sure what to think of it. Having it optionally perhaps?

vladox commented 4 years ago

@thibaudcolas @loicteixeira I tried to add a new definition for Wagtail in Wappalyzer but without being able to access the API or the admin URLs there's no way to detect whether it's a Wagtail site.
The only way I see is to do a PR to the Wagtail base to add a generator meta.

loicteixeira commented 4 years ago

If the conclusion is that the "best" way to discriminate a Wagtail app is to add a generator, I reckon that's a discussion for the Wagtail project itself and an issue should be created there.

vladox commented 4 years ago

Absolutely I'll try that route, thanks for your feedback!

mojeto commented 4 years ago

I understand that specific Wagtail generated http header would make detection easier, but for what cost? These types of headers can leak sensible information to potential attacker. Django refused to add x-powered header for that same security reason https://code.djangoproject.com/ticket/14431 years ago. See https://pentest-tools.com/blog/essential-http-security-headers/#info-leak for more examples. No wagtail specific header makes MWW work a bit harder, but the actual website server is a bit safer.

loicteixeira commented 4 years ago

As stated in my original comment, I'm not really in favour for a generator tag of any sort either.

However, I reckon this is a discussion to have on the Wagtail repo which is why I encouraged the discussion to move over there.

That being said, @mojeto still being part of the team in charge of Made With Wagtail (while @thibaudcolas and myself aren't anymore), if they would rather continue without a generator tag, given that the need for such tag has only been voiced for MWW use (afaik, nobody else ever mentioned it), then maybe it's not even worth opening that discussion.

thibaudcolas commented 3 years ago

I’ve finally taken the time to experiment with this and implemented detection based on image renditions, if anyone wants to try it out it’s available from https://detect-wagtail.netlify.app/, and the implementation details are further discussed in https://thib.me/detecting-wagtail-in-the-wild. This will be released in Wappalyzer soon: https://github.com/AliasIO/wappalyzer/pull/3546.

Having spent the time implementing this I find the question of whether or not to have a generator header / meta tag a bit moot – there are so many ways to detect Wagtail anyway, so I don’t think it’s worth adding that to Wagtail, but equally the absence of such a tag shouldn’t be construed as having any security benefit whatsoever.