opsdisk / metagoofil

Search Google and download specific file types
Other
405 stars 85 forks source link

Google similar entries #1

Closed pathetiq closed 8 years ago

pathetiq commented 8 years ago

Hi,

When doing a search and google does not show all files but instead he show 1 or a few and then there is the following message:

In order to show you the most relevant results, we have omitted some entries very similar to the 1 already displayed.
If you like, you can repeat the search with the omitted results included.

The program only see the first item found by Google and will stop there. It is a problem as I've got some example where 4 to 5 pages of results does offers PDF but only one is shown.

Let me know if you know a way to fix it or if you need help for the code/test.

Thanks

opsdisk commented 8 years ago

Hi pathetiq,

Thanks for the headsup. Do you have an example site and file format that this happens on? I can take a look, or if you have a fix, feel free to submit a pull request.

pathetiq commented 8 years ago

Thanks for the fast response,

If you try for domain hackfest.ca and filetype pdf it will return 1 file where there is a lot more. Let me know if you can reproduce.

opsdisk commented 8 years ago

Definitely able to reproduce. Let me look into it a bit more.

opsdisk commented 8 years ago

pathetiq,

Added extra_params={'filter': '0'} by default to the google.search function to return unfiltered results. Looks like it returned all 34 PDFs for your specific case. Let me know if you are still having issues. Thanks for suggestion!

Also updated the -n switch default to 100 and improved the url request code.

pathetiq commented 8 years ago

Thanks a lot, working perfect on my side. Will look to add improvement or send suggestion.