opsdisk / pagodo

pagodo (Passive Google Dork) - Automate Google Hacking Database scraping and searching
GNU General Public License v3.0
2.66k stars 486 forks source link

Unicode error (ghdb-scraper) #37

Closed bimarifin closed 4 years ago

bimarifin commented 4 years ago

when fetching ghdb, its getting error, because variable saved to disk needs to be encoded as utf-8

[*] Initiation timestamp: 20200613_050537
Traceback (most recent call last):
  File ".\ghdb_scraper.py", line 95, in <module>
    retrieve_google_dorks(**vars(args))
  File ".\ghdb_scraper.py", line 55, in retrieve_google_dorks
    fh.write(f"{extracted_dork}\n")
  File "C:\Users\murray\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0131' in position 30: character maps to <undefined>

in line 49, add: encoding='utf-8' to open function

fix this with:

with open(google_dork_file, "w", encoding='utf-8') as fh:

opsdisk commented 4 years ago

Thanks for reporting this @bimarifin . What exact version of Python are you using? Looks like 3.7 from your output. I just tried with Python 3.6.7 and 3.7.5 and didn't have any issues, but I'll take a look at adding some defensive measures.

bimarifin commented 4 years ago

image

btw i also making changes with your code, i think people like me doesnt want to get all dorks from ghdb, just spesific category, so i making changes and looks like that.

image

but i dont know how to changes / commit to your repo, sorry newbie with github :(

bimarifin commented 4 years ago

but you can use this :

_categories = {'1': 'Footholds', '2': 'File_Containing_Usernames', '3': 'Sensitives_Directories', '4': 'Web_Server_Detection', '5': 'Vulnerable_Files', '6': 'Vulnerable_Servers', '7': 'Error_Messages', '8': 'File_Containing_Juicy_Info', '9': 'File_Containing_Passwords', '10': 'Sensitive_Online_Shopping_Info', '11': 'Network_or_Vulnerability_Data', '12': 'Pages_Containing_Login_Portals', '13': 'Various_Online_devices', '14': 'Advisories_and_Vulnerabilities'}

and just add for loop into retrieve google_dorks

for key, value in _categories.items():

        url = "https://www.exploit-db.com/google-hacking-database?category={}".format(key)
opsdisk commented 4 years ago

Great idea! That capability would be awesome. Check out this branch and let me know what you think:

git clone https://github.com/opsdisk/pagodo.git
git branch -a
git checkout issue-37-ghdb_scraper-unicode-error

Pull request is here: https://github.com/opsdisk/pagodo/pull/38

opsdisk commented 4 years ago

Merged into master