opsdisk / metagoofil

Search Google and download specific file types
Other
405 stars 85 forks source link

Add User-Agent option? #2

Closed theabraxas closed 7 years ago

theabraxas commented 8 years ago

Might it be possible to add a user-agent impersonation option for accessing sites? I've run across things like this a few times: https://stackoverflow.com/questions/25936072/python-urllib2-httperror-http-error-503-service-unavailable-on-valid-website and having a way to set a user agent with a flag would be phenomenal

opsdisk commented 8 years ago

Been meaning to do that...give me a couple days.

opsdisk commented 8 years ago

Just added a -u switch to randomize the user-agent from the user_agents.txt file...otherwise the default is "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)". Turns out I was making a request twice (once to check the size, and another to download the file) where one of them was the Python urllib agent. I went back and forth between randomizing the user-agent, which I think is better for evading defensive network devices, or allowing the user to specify it as a command line switch. I settled on this for now, but won't close the ticket, because I want to figure out this argparse logic to satisfy your request:

no -u = Use default Google bot user-agent -u = Randomize user-agent -u "my useragent 2.0" = User-specified user-agent.

theabraxas commented 8 years ago

That's awesome! Thanks man!

opsdisk commented 8 years ago

Just updated the code and documentation...let me know if that works!

Added the -u user agent switch to customize the User-Agent used to retrieve files. If no -u is provided, then the User-Agent for every file download is 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'. If only -u is provided, a random User-Agent from user_agents.txt will be picked for each file request. Lastly, a custom User-Agent can be added by providing a string after the -u...for example -u "My custom user agent 2.0".