jaeles-project / gospider

Gospider - Fast web spider written in Go
MIT License
2.49k stars 304 forks source link

option to get just urls #10

Open marcelo321 opened 4 years ago

marcelo321 commented 4 years ago

Would be cool an option to get just the urls alone without the [words] and everything else. like a flag to do that

Bedrovelsen commented 4 years ago

Solution would be to add url matching regex such as this to codebase with a urls only flag added to options

(([a-zA-Z][a-zA-Z0-9+-.]*\:\/\/)|mailto|data\:)([a-zA-Z0-9\.\&\/\?\:@\+-\_=#%;,])*

Temporary easy solution until then You can easily pipe the commands stdout to a grep with a regex to match just urls

gospider -s https://jaeles-project.github.io/ -t 10 -d 1 -c 10 | grep -o -E "(([a-zA-Z][a-zA-Z0-9+-.]*\:\/\/)|mailto|data\:)([a-zA-Z0-9\.\&\/\?\:@\+-\_=#%;,])*" | sort -u | tee justurls.txt

You alternatively can save the output with -o flag then grep the contents of output directories file (or files if multiple input sites such as live site subdomains in this example)

gospider -S <(assetfinder -subs-only hackerone.com | sort -u | httprobe -prefer-https) -t 10 -d 1 -c 10 -o outdir && grep -r -o -E "(([a-zA-Z][a-zA-Z0-9+-.]*\:\/\/)|mailto|data\:)([a-zA-Z0-9\.\&\/\?\:@\+-\_=#%;,])*" outdir | sort -u | tee rawstdout.txt

Feel free to replace

-S <(assetfinder -subs-only hackerone.com | sort -u | httprobe -prefer-https)

with

-S alivedomains.txt

If already have list of domains in the file alivedomains.txt or whichever

nav7neeet commented 3 years ago

I feel "-q, --quiet Suppress all the output and only show URL" flag was meant for only showing the URL but its not working in the latest version v1.1.2.

yuraloginoff commented 3 years ago

I feel "-q, --quiet Suppress all the output and only show URL" flag was meant for only showing the URL but its not working in the latest version v1.1.2.

Same here. -q gives no effect.

arthur4ires commented 2 years ago

The -q option is not yet fully effective and continues to print unnecessary information to the screen.