projectdiscovery / wappalyzergo

A high performance go implementation of Wappalyzer Technology Detection Library
MIT License
698 stars 107 forks source link

How does updating fingerprint works #2

Closed bugbaba closed 2 years ago

bugbaba commented 3 years ago

Hi Team,

As always another major and interesting opensource project from you all :fire:

One of the first thing that I noticed that https://github.com/rverton/webanalyze downloads the fingerprints from the wappalyzer repo and save them locally and also has a update flag which can be used to update the same.

But looks like this project is hard coding them in https://github.com/projectdiscovery/wappalyzergo/blob/master/fingerprints_data.go which is weird considering you have to update the code every time their is an update in the wappalyzer fingerprint which is quite frequent.

Also the original file https://github.com/AliasIO/wappalyzer/blob/master/src/technologies.json is of 22181 lines where as even if we beautify https://github.com/projectdiscovery/wappalyzergo/blob/master/fingerprints_data.go its of 10861 lines only. Would love to understand why is this difference, are you ignoring non useful tags from the list ?

I did see this https://github.com/projectdiscovery/wappalyzergo/blob/master/cmd/update-fingerprints/main.go But how does this works if am using this in my code like below, does it tries to update the fingerprint every time wappalyzer.New() is being called ? Can we manually invoke this update part using something like wappalyzer.update() only once before reusing the same wappalyzerClient.

      for _, url range urls{
    resp, err := http.DefaultClient.Get(url)
    if err != nil {
        log.Fatal(err)
    }
    data, _ := ioutil.ReadAll(resp.Body) // Ignoring error for example

    wappalyzerClient, err := wappalyzer.New()
    fingerprints := wappalyzerClient.Fingerprint(resp.Header, data)
    fmt.Printf("%v\n", fingerprints)
       }

-- Regards, @bugbaba

bugbaba commented 3 years ago

Just saw this commit https://github.com/projectdiscovery/wappalyzergo/pull/1/commits/bc4b77f03a54386798b19f546a8a4f227c3db501 you will be running update-fingerprints -fingerprints fingerprints_data.go weekly but again this doesn't seems the best option. whats the problem with using the fingerprint in original format like webanalyze does.

-- Regards, @bugbaba

Ice3man543 commented 3 years ago

@bugbaba regarding the lesser number of records, we intentionally omit certain fields from the wappalyzer schema which are not required in our implementation, so that consists of a big number of dropped lines. The update-fingerprint script does this + validates all the regexes and creates a simpler normalized schema that is much smaller and only contains things that are of interest to us.

Regarding the update problem, yes we are still thinking of the best way to do this. For now, we didn't need very frequent wappalyzer dataset updates, hence this implementation. However, we can potentially add a function to pull this off of github periodically, maybe every 24 hours or so. Tagging this issue as question to keep further track! Thanks for bringing this to attention.