projectdiscovery / wappalyzergo

A high performance go implementation of Wappalyzer Technology Detection Library
MIT License
698 stars 107 forks source link

Add injectable javascript agent for technologies detection #36

Open Mzack9999 opened 1 year ago

Mzack9999 commented 1 year ago

An always up-to-date javascript injectable library version should be prepared and kept up to date. The library should be injected via headless into existing browsers' contexts and collect enriched info within the JS engine for a specific domain open in a browser tab. This is similar to the behavior of the official wappalyzer extension https://chrome.google.com/webstore/detail/wappalyzer-technology-pro

Bisstocuz commented 1 year ago

Maybe can define a new function:

func FingerprintURL(url string) map[string]struct{} {
    // 1. Request URL and resolve HTML, CSS or JS.
    // 2. Detect and return result
}

Maybe it is not necessary to use the headless library, we can also use HTML parser like goquery.

Another question: Why use map[string]struct{} instead of []string as returned value?

Gby56 commented 2 months ago

I'm actually confused right now, I thought I could feed both HTML and JS to the Fingerprint function and it would apply regexes directly. I simply used a headless browser to load a web page and get all the other loaded assets automatically, and I wanted to fingerprint everything that gets loaded, but it looks like wappalyzergo doesn't regex the raw javascript files I give it, it's only doing that on JS included inside HTML ?

I think the problem resides in the checkBody() function, as it always expects HTML and tries to tokenize it, if I give it a pure JS file it won't work.