wkentaro / gdown

Google Drive Public File Downloader when Curl/Wget Fails
MIT License
4.21k stars 348 forks source link

Docs files interrupt downloads #341

Open OmarAHex opened 6 months ago

OmarAHex commented 6 months ago

Provide environment information

Python 3.11.5

What OS are you using?

Windows 11

Describe the Bug

When downloading a folder, if there is a google doc present in the folder, a FileURLRetrievalError will pop up and the download will stop in it's tracks. But this file is actually downloadable with a format of https://docs.google.com/document/d/{id}/export which is a direct link (or redirects to one anyway). I'm not sure whether this happens with presentations (https://docs.google.com/presentation/d/{id}/export) and spreadsheets (https://docs.google.com/presentation/d/{id}/export) but I assume it would

Expected Behavior

No response

To Reproduce

No response

OmarAHex commented 6 months ago

Upon investigation, this bug report was not quite accurate, the core of the error is that gdown fails to download docs files if the request originates in a non-english region. As I am in the middle east, my google docs documents don't have '- Google Docs' their title, but rather an arabic translation of it. I've fixed this in my fork by simply checking for '/document/', '/spreadsheets/', and '/presentation/' in the url redirected from drive.google/com/open?id instead of checking the title, and it seems to be working fine.

wkentaro commented 4 months ago

@OmarAHex Can you give me an example to reproduce?

OmarAHex commented 4 months ago

https://drive.google.com/drive/folders/12cQ4ltgbkBhltqylzSg7g0n7lcTV8zew This is the same folder i gave in the other issue (about filename sanitization), meaning it will crash due to the asterisk issue first (on windows atleast) This error cannot be reproduced in english-speaking countries, you must use a vpn to see the error because it is due to the fact that the "Google Docs" html title is translated in non-english regions