projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
11k stars 583 forks source link

Add an option to collect unique words and generate a word list #649

Closed ptyspawnbinbash closed 11 months ago

ptyspawnbinbash commented 11 months ago

Please describe your feature request:

Add a feature to generate a word list from the unique words gathered during the crawling process, inspired by the functionality found at https://github.com/digininja/CeWL.

Describe the use case of this feature:

Generate target specific word lists.

CeWL has its limitations on modern sites, pairing Katana with a headless browser might work well.

Mzack9999 commented 11 months ago

I think this should be either a plugin or a separate post-elaboration tool that could process generic data (eg. json output). For example it could generate wordlists from httpx/katana multi-domains or existing data.

dogancanbakir commented 11 months ago

It makes sense! Or maybe something similar to what we do in httpx KnowledgeBase https://github.com/projectdiscovery/httpx/blob/b413784008fe89c1eb19e57fe495c4183a152e93/runner/types.go#L87 What do you think?

dogancanbakir commented 11 months ago

@ptyspawnbinbash, Thanks for opening this issue! After reviewing the feature request and considering the insightful discussions, developing a post-processing tool dedicated to generating word lists would be the most effective approach. This tool would process the raw data collected from crawling processes, such as those performed by katana or httpx, and generate targeted word lists as @Mzack9999 recommended. So, closing this.