opsdisk / pagodo

pagodo (Passive Google Dork) - Automate Google Hacking Database scraping and searching
GNU General Public License v3.0
2.67k stars 487 forks source link

ghdb_scraper.py no longer retrieves dork #21

Closed Astroida closed 5 years ago

Astroida commented 5 years ago

The GHDB scraper no longer works - presumably this is because the exploit-db website has been updated.

Here's the output I am getting:

[] Initiation timestamp: 20181205_104042 [] Spawing thread #0 [] Spawing thread #1 [] Spawing thread #2 [+] Retrieving dork 6: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 7: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 9: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 10: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 5: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 8: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 12: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 13: Penetration Testing with Kali Linux (PWK) [+] Retrieving dork 15: Penetration Testing with Kali Linux (PWK)

opsdisk commented 5 years ago

Hi @Astroida - thank you for alerting me to that! Site definitely looks different. I'll take a look at updating the code. For the time being, you can change the URL (https://github.com/opsdisk/pagodo/blob/master/ghdb_scraper.py#L34) to point to https://old.exploit-db.com which is the old site that worked with ghdb_scraper.py

Astroida commented 5 years ago

Hi @opsdisk, thanks for the temp fix! Wasn't aware that the old link still exists.

opsdisk commented 5 years ago

@Astroida Try taking this branch for a spin: https://github.com/opsdisk/pagodo/tree/issue-21

My original testing shows they start blocking attempts after 500-1000 requests even with 1 thread, so I may have to add some logic to back off / randomize the request rate like I do in https://github.com/opsdisk/metagoofil/blob/master/metagoofil.py

opsdisk commented 5 years ago

nm testing this. I figured out how to pull all Google dorks with 1 HTTP GET request. May take a day or two to push the code.

opsdisk commented 5 years ago

Pull the latest from master branch. Pushed some fresh updates.