hueristiq / xurlfind3r

A command-line interface (CLI) based passive URLs discovery utility. It is designed to efficiently identify known URLs of given domains by tapping into a multitude of curated online passive sources.
https://github.com/hueristiq/xurlfind3r
MIT License
535 stars 63 forks source link

[BUG] #13

Closed dirtybull closed 1 year ago

dirtybull commented 2 years ago

Description Hi, thanks for the amazing tool. When I playing around using 'commoncrawl' as the source I found it sometimes can print out urls but sometimes can't. So I modified the code to print out the errors alongside then I found commoncrawl returned tons of 503 and timeout

Steps To Reproduce ./sigurlfind3r -d tesla.com --include-subs -uS commoncrawl

Screenshots Screenshot_sigurlfind3r_1 Screenshot_sigurlfind3r_2

Additional context I tested over my home network as well as a VPS located on Los Angeles. I changed a bit of the code to avoid parallel processing and increased the timeout to 60s from 10s. Then it worked as expected. I haven't found any official documented rate limit of commoncrawl though, but reliability is the top concern in my opinion, especially in terms of automation. Just for your reference and thanks again for your work :)

enenumxela commented 2 years ago

Thanks @dirtybull, I will look into it