urbanadventurer / WhatWeb

Next generation web scanner
https://www.morningstarsecurity.com/research/whatweb
GNU General Public License v2.0
5.2k stars 885 forks source link

Detection based on Version Information in JavaScript Files #325

Open Phylu opened 3 years ago

Phylu commented 3 years ago

Within the WhatWeb plugins, I have multiple ways to detect frameworks with versions based on regexes in the code or based on the occurrence of certain files. What I would like to do is the following in addition to that:

Many times, these JavaScript files (which could be named main.js or vendor.js contain comments like the following:

     * http://jquery.com/
     *
     * Includes Sizzle.js
     * http://sizzlejs.com/
     *
     * Copyright jQuery Foundation and other contributors
     * Released under the MIT license
     * http://jquery.org/license
     *
     * Date: 2016-01-08T20:02Z
     */

Is there a way to implement something like this within a plugin? Or for all existing plugins so that the regexes could be used "recursively" on js pages that are included?

urbanadventurer commented 3 years ago

Surprisingly I was just thinkingšŸ’” about how to add JavaScript library detection to WhatWeb. I'll just dump my thoughts here, so we can kick off a discussion.

We will need:

Things that make JavaScript unique:

Thoughts:

Some questions to consider:

I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.

Phylu commented 3 years ago

My thoughts here:

  • Should WhatWeb scan only same-site JS or also remote JS URLs? I suggest to fetch both in order to check for:

  • Version numbers in the URL Path

  • Version numbers in the GET Parameter

  • Version numbers in the JS Files themselves

  • Should WhatWeb parse JS to discover URLs for other loaded or imported JS files?

I suggest to not do this (at least in the beginning). Of course there is techniques like Google Tag Manager, but as a first step (probably much easier & faster to implement and maintain), all the files that are included directly such as all minified js files from a vendor folder may be fine.

  • A headless browser like headless Chrome or Firefox would work to parse and discover JS URLs, but is it too resource heavy?

We have some experience here, and i totally agree with the resource issue. In addition, it will add huge third party dependencies for whatweb.

I guess step one is to start collecting JS Library patterns. Ideally we could have patterns that would survive the minify process.

I would probably try to start with patterns using version numbers, as they are a good way to get information about the used libraries independent from their name

Possible license string & pattern (I will keep the eyes open for more):

* @license Angular v8.0.2\n     --> /@license ([a-zA-Z]*) v?([1-9])*\.?([1-9])\.?([1-9])?/