opsdisk / pagodo

pagodo (Passive Google Dork) - Automate Google Hacking Database scraping and searching
GNU General Public License v3.0
2.66k stars 485 forks source link

GHDB scraper produces inaccurate output #96

Open marz-hunter opened 5 months ago

marz-hunter commented 5 months ago

I found that the output from ghdb scraper was not precise. for example the title gives "Google Dork" but when viewed it produces site:".edu" intitle:"index of"|".db" and the tool saves the output "Google Dork" instead of site:".edu" intitle:" index of"|".db"

https://github.com/opsdisk/pagodo/assets/58464282/3518c55f-6156-4638-8861-38edd673bb18

marz-hunter commented 5 months ago

output tools. I think crawling a url like https://www.exploit-db.com/ghdb/8389 would be more accurate although it would take a little longer output

marz-hunter commented 5 months ago

crawling and then taking this part will be more accurate get

opsdisk commented 5 months ago

Hi @marz-hunter - thanks for opening an issue. In the past, I've also noticed the data isn't structured as precisely as it should be by exploit-db.com. I vaguely recall thinking exploit-db.com needs to clean up the data instead of wanting to handle edge cases or invest any more time in the script. Give me a week or two to dig deeper into it though.

In the mean time, you could reach out to them and see if they could clean up the ones you found. Email can be found here: https://www.exploit-db.com/submit

image

opsdisk commented 5 months ago

Just found this as well that could be used https://gitlab.com/exploit-database/exploitdb/-/blob/main/ghdb.xml

Mind checking it for the same dependencies you found?

Edit: You may be able to submit a PR against it for any ones you find as well. I'm guessing that is what powers https://www.exploit-db.com/google-hacking-database

marz-hunter commented 5 months ago

yeah this one looks pretty good https://gitlab.com/exploit-database/exploitdb/-/blob/main/ghdb.xml, but updates seem to take a while

opsdisk commented 4 months ago

I'll keep this issue for a while to see if they update it. Let me know if you do submit a PR against https://gitlab.com/exploit-database/exploitdb/-/blob/main/ghdb.xml

opsdisk commented 2 months ago

Just wanted to check back on this one @marz-hunter Did you have any updates?