Download a folder with more than 50 files

wkentaro / gdown

Google Drive Public File Downloader when Curl/Wget Fails

MIT License

4.16k stars 345 forks source link

Download a folder with more than 50 files #127

Closed GeorgeBatch closed 2 years ago

GeorgeBatch commented 2 years ago

I tried downloading a folder from Google Drive with more than 50 files. Is there a way to use gdown to download such a folder?

GeorgeBatch commented 2 years ago

I thought that if I could get the links/ids of all small folders (<50 files) and files, then I will be able to iterate through them and download the contents of the full folder. I also thought that recording the download status is a good idea in case the process interrupts so that you don't need to start from scratch.

Here is how you can get the links to files/folders: StackExchange Answer.

I recorded my attempt of using it in TCGA-lung-download repository. It did not fully work since after some time I started running into the problem described in issue#43 for gdown library.

Do you have any idea how I can catch the Access Denied ... warnings as Exceptions/Errors so that I can at least record that the folder/file was not successfully downloaded. This way, one would be able to resume the process after Google grants access again in 24 hours.

wkentaro commented 2 years ago

gdown.download returns None when it fails, whereas it returns file path if succeed. Does this help? https://github.com/wkentaro/gdown/blob/12217e3a9e9b5651119a1bf0e19ff9a94f8779e5/gdown/download.py#L183

GeorgeBatch commented 2 years ago

Thank you! I will try to use this somehow. It does not solve the problem straight away since I was using gdown.download_folder() to download folders one by one from within this folder in Google Drive. I guess I could use the return from the gdown.download_folder() function, but I am not sure if the file name will still be added or not if the access is denied by Google.

https://github.com/wkentaro/gdown/blob/12217e3a9e9b5651119a1bf0e19ff9a94f8779e5/gdown/download_folder.py#L359

By the way, why do you limit the number of downloadable files within one folder to 50? https://github.com/wkentaro/gdown/blob/12217e3a9e9b5651119a1bf0e19ff9a94f8779e5/gdown/download_folder.py#L27

wkentaro commented 2 years ago

This is why: https://github.com/wkentaro/gdown/pull/90#issuecomment-787446632

GeorgeBatch commented 2 years ago

This is why: #90 (comment)

Makes sense! Thank you!

tgandor commented 2 years ago

Hm, this would be interesting. rclone can somehow do it, but I needed to enable API access for it (there were some hints about doing it anonymously; there was also something about this being slower than getting metatada from the API).

An alternative would be to script gdown for downloading multiple files; the URLs for a large folder could be gathered using something like this: https://webapps.stackexchange.com/questions/88769/get-share-link-of-multiple-files-in-google-drive-to-put-in-spreadsheet (answer about using JS to store the URL list in a google spreadsheet).

GeorgeBatch commented 2 years ago

Hm, this would be interesting. rclone can somehow do it, but I needed to enable API access for it (there were some hints about doing it anonymously; there was also something about this being slower than getting metatada from the API).

An alternative would be to script gdown for downloading multiple files; the URLs for a large folder could be gathered using something like this: https://webapps.stackexchange.com/questions/88769/get-share-link-of-multiple-files-in-google-drive-to-put-in-spreadsheet (answer about using JS to store the URL list in a google spreadsheet).

Have not tried rclone, but it looks like it should work. The solution with writing the script did not work - looks like Google just starts blocking access to files from the same folder. See issue#43 in the gdown library.