Implement extraction of asset version from response body

projectdiscovery / wappalyzergo

A high performance go implementation of Wappalyzer Technology Detection Library

MIT License

723 stars 110 forks source link

Implement extraction of asset version from response body #79

Open elvin-tajirzada opened 6 months ago

GeorginaReeder commented 6 months ago

Thanks for your contribution @elvin-tajirzada , we appreciate it!

We also have a Discord server that you're welcome to join. It's a great place to connect with fellow contributors and stay updated with the latest developments!

elvin-tajirzada commented 5 months ago

I need the URL to extract the version from the scripts. Let me give an example. Let's assume that jquery is used. Right now the version of jquery does not come, because the version is inside Jquery's own script. (Script: ). I need the URL to reach the /bootstrap/js/jquery.js endpoint. Unfortunately, the URL is not included in the Body of all sites.

Gby56 commented 3 months ago

Hi ! I think I'm currently doing something similar to properly analyze a full webpage, I have a headless browser to get the list of all loaded assets, then I download all of them and analyze them to detect if a piece of JavaScript bundle had react, jquery and so on... Problem is that by default wappalyzer seems to only tokenize HTML and doesn't try to regex js files, I think you fixed that here ?

Or is it just extracting the version but not the actual technology from the content ?

elvin-tajirzada commented 3 months ago

Yes, it is just extracting the version. It doesn't extract the actual technology from the content.

Gby56 commented 3 months ago

Ok I see... My idea might fit in a different PR then, adding a new Fingerprinting function to indicate whether it's HTML or a js file to analyze, so that we skip the HTML tokenizer etc...

elvin-tajirzada commented 3 months ago

Yes. Right now my approach is used in our project but this idea can be written.