wkentaro / gdown

Google Drive Public File Downloader when Curl/Wget Fails
MIT License
4.16k stars 345 forks source link

Permission denied ... Maybe you need to change permission over 'Anyone with the link'? #43

Open alexklwong opened 4 years ago

alexklwong commented 4 years ago

I want to download a 16GB zip file on Google Drive using:

gdown https://drive.google.com/uc?id=1InfIal4y7OBMGNUDeldEmDxtD0MrewY8

I have already set the permissions on the file to ``Anyone with the link''

But I get this error:

Permission denied: https://drive.google.com/uc?id=1InfIal4y7OBMGNUDeldEmDxtD0MrewY8 Maybe you need to change permission over 'Anyone with the link'?

mrlzla commented 4 years ago

The same error

zenithfang commented 4 years ago

same error

wkentaro commented 4 years ago

Maybe the same issue as https://github.com/wkentaro/gdown/issues/26.

I found this happens with large data + many access (e.g., public dataset) #42. However, I have no idea how to fix this.

If they are the same issue, there's no solution at the moment from gdown side.

Rmao99 commented 4 years ago

I have this issue as well with a 15GB dataset and a 4.3 GB dataset. But when I copy the 4.3GB to my local drive and manually get the link on my copied dataset it works. Can't make a copy of the 15GB on free drive though.

For reference the datasets im downloading are: https://drive.google.com/file/d/1NrqOLbMa_RwHbG3KIXJFWLnlND2kiIpj/view https://drive.google.com/file/d/15w0q3Nuye2ieu_aUNdTS_FNvoVzM4RMF/view

I was actually able to successfully download both datasets once from the original links but after needing to redo the downloads they no longer work even when uninstalling and reinstalling gdown

wkentaro commented 4 years ago

@Rmao99 Thanks for the information. That behavior makes sense if that's the issue of Google-Drive's side. It seems they restrict access from the command-line for data with many access (e.g., public dataset).

wkentaro commented 4 years ago

I've added a new feature to keep cookies https://github.com/wkentaro/gdown/pull/51. I hope this fixes this issue.

You can try pre-released version of this.

pip install -U --no-cache-dir gdown --pre

by default, cookies is enabled and saved into ~/.cache/gdown/cookies.json. You can disable it by gdown --no-cookies XXX

Jackie-LJQ commented 4 years ago

I have the same issue. I tried the above version, not fix.

whulixiya commented 4 years ago

I met the same error when I try to download many big files from my google drive, have anyone fixed it?

carlmalamud commented 4 years ago

I find if I rm ~/.cache/gdown/cookies.json and then restart my script, things start downloading again. Is your cookie expiring perhaps? I'm good for 1,000 or so files before I get the error.

Thanks for gdown. Very helpful!

wkentaro commented 4 years ago

@carlmalamud Thanks for info. Do you have any sharable links to reproduce this?

whulixiya commented 4 years ago

Unfortunately, there is nothing in ~/.cache/gdown/

On Wed, Jun 3, 2020 at 10:23 PM Carl Malamud notifications@github.com wrote:

I find if I rm ~/.cache/gdown/cookies.json and then restart my script, things start downloading again. Is your cookie expiring perhaps? I'm good for 1,000 or so files before I get the error.

Thanks for gdown. Very helpful!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wkentaro/gdown/issues/43#issuecomment-638232081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDECDYFMKEGB3BS3FBLVETRUZMGZANCNFSM4LTI65IQ .

-- Xiya Li State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China (Master)

carlmalamud commented 4 years ago

@wkentaro What is happening is your cookie keeps growing and growing until it is full. I believe every time there is a warning, a new line gets added. I've seen my cookie grow from 11 warnings to 36 warnings in about 2 hours. After about 1,000 files, there are presumably enough that there isn't room for more warnings, so you when present the cookie that doesn't mention a particular download, Google rejects it and your code is thinking that it is a permission issue.

Perhaps if your cookie fails, you could remove it, try again, and only then if it fails a second time assume it is a permissions issue.

Here is a snippet of what my cookie looks like:

[ [ "AUTH_22nmp4ombsvnvmjrl3nb3u3hsrucc95e", "14651752893030988712Z|1591294950000|hospjjjbfh4rapv59tff247rc9fhcbr3" ], [ "NID", "204=x4h9LoiFMtrKRVT3aYaAXNxcePhv5SjsTw9qrioaMhrWD61BLoO_hbjosmcIBEYKSEf-pqz0U5D9SyUKwCiHJJ1Ys_xGBVg8oLqizpJaVYdWUoSIgSZKNL3xpkoGvIQs9lz8hTQD9EgvcwunJ--j5FtM7OPAn_W8mnVaHEVweSs" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQEMExkMlBpOUJoQ28", "IcVe" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQEMmVDYVJUckNQZlE", "Ricd" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQENGlEZDNabFRwVzg", "579P" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQENk9JWmNMNm81R0k", "o4Bf" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQENlVqTnJsc2R5dzA", "osl-" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQEOGpkZzFad25SdFE", "8XbE" ],

carlmalamud commented 4 years ago

It looks like the magic number is 110 download_warnings and the cookie is full.

wkentaro commented 4 years ago

@carlmalamud thanks!

wkentaro commented 4 years ago

Sent a PR here: https://github.com/wkentaro/gdown/pull/55

wkentaro commented 4 years ago

Released as v3.11.1

wkentaro commented 4 years ago

@carlmalamud can you try if the new version eases the issue?

grudloff commented 4 years ago

Same problem, upgraded to latest release and I get the following message:

Access denied with the following error:

    Too many users have viewed or downloaded this file recently. Please
    try accessing the file again later. If the file you are trying to
    access is particularly large or is shared with many people, it may
    take up to 24 hours to be able to view or download the file. If you
    still can't access a file after 24 hours, contact your domain
    administrator. 

You may still be able to access the file from the browser:
%%%URLHERE
carlmalamud commented 4 years ago

I was chugging along nicely for a while, but I now am also getting same error as @grudloff. I was in fact able to pull the file up in a browser and start the download as indicated above. I looked in the gdown cookie and didn't see any download_warning lines, so evidently we never got the cookie back from them with the additional authorization.

(I have now installed 3.11.1, but was running on 3.11.0 before, I simply inserted a "rm ~/.cache/gdown/cookies.json" every fifty lines. Sorry, it was working so I didn't want to bother it. :). The upgrade doesn't seem to help on the above error.

carlmalamud commented 4 years ago

I am imagining that the error might have something to do with my being an authenticated user in the browser and not via gdown. I'm not sure is presenting an oauth token might make a difference here?

wkentaro commented 4 years ago

I'm not sure is presenting an oauth token might make a difference here?

I don't think so, gdown uses requests inside without using any browsers.

kshitijagrwl commented 4 years ago

Solved by disabling certificate verification by changing line 111 in download.py

res = sess.get(url, stream=True, verify=False)

However it does display this warning InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning

wkentaro commented 4 years ago

Solved by disabling certificate verification

In my case, it didn't work.

dtch1997 commented 4 years ago

I encountered this issue downloading just 3 files from Gdrive, about 2.8 GB or so. The first two worked fine and the permission error popped up for the third file. After that all downloads stopped working.

ghost commented 4 years ago

@grudloff , Got same error here, did you solve it?

ganler commented 4 years ago

Is there any other platform that allows us to place some files in public?

gsygsy96 commented 4 years ago

The solution is to change the permission.

lsgrep commented 3 years ago

I had the same issue, I think this is somewhat related to the issue that Google Drive is probably rate-limiting the files based on the traffic.

jungerm2 commented 3 years ago

I'm having this issue as well, but I do not think it's a permission problem. In my case, there's the "Download Anyway" button but it needs to be clicked twice. The first time, the URL changes (a confirm=.... code gets added), the second the download starts. This seems to be dealt with to some extent within the code:

https://github.com/wkentaro/gdown/blob/e3bf98dc1b698e10bfb84768fb294256cbb6a39f/gdown/download.py#L44-L49

However:

  1. The function get_url_from_gdrive_confirmation parses HTML line by line with regex. This is problematic for a ton of reasons like for instance if the HTML is malformed there might be a newline in the middle of an HTML tag, a regex match that is not the correct one might occur as earlier lines are given first dibs, and well, it's simply not possible to effectively parse HTML with regex.

  2. The code above actually returns an empty string. This is because the regex substitution is done on the variable url which will be the empty string (because that's what it's initialized to, and after it gets assigned to the function always returns).

Because of these two reasons, the URL that get_url_from_gdrive_confirmation will return when a confirmation is needed could be either None (the function doesn't return anything because of case 1), or an empty string in case 2. This then throws the error we've been seeing:

https://github.com/wkentaro/gdown/blob/e3bf98dc1b698e10bfb84768fb294256cbb6a39f/gdown/download.py#L146-L153

My recommendation is to ditch the regex and use a dedicated HTML parser like lxml (either directly or through BeautifulSoup4). This would avoid future issues and be much more robust (although yes, it's an extra dependency). I wouldn't mind contributing to this but have little time at the moment. Hopefully, this helps!

For reference, the file I've had this issue with has an id of 1VeULcuxDUMMSg6TkgwBpiEaIjSykL1fO. It is large but I've had success with much larger files so the sheer size shouldn't be an issue.

wkentaro commented 3 years ago

Thanks for the detailed analysis. What I'm not sure is actually this has something to do with the parser. If I access https://drive.google.com/uc?id=1VeULcuxDUMMSg6TkgwBpiEaIjSykL1fO with logging in the Gdrive, it actually shows the download button:

image

However, if I access without logging in, it shows as below. So I still think this issue is not solvable without a login session. Any thoughts?

image

jungerm2 commented 3 years ago

I'm not seeing that behavior. When logged out I still get the download button as you can see here: image

However, when clicking on it, it redirects to an HTTP 403 error... I just checked and the permissions are that anyone with the link can view. In fact, I applied these permissions to a folder full of files and gdown worked for all of them except this one. I've also applied these permissions directly to this file. Maybe this is an error on google's part? That's unlikely.

jungerm2 commented 3 years ago

I just confirmed it, google drive was glitching out. I didn't change anything and it now works... I did however get the following error before it worked (I tried re-downloading it twice):

Access denied with the following error:

        Too many users have viewed or downloaded this file recently. Please
        try accessing the file again later. If the file you are trying to
        access is particularly large or is shared with many people, it may
        take up to 24 hours to be able to view or download the file. If you
        still can't access a file after 24 hours, contact your domain
        administrator.

No one (besides myself and the fine folks on this issue that tried to download my file) has been viewing/downloading this file. However, other files that are hosted on the same drive account have been downloaded at full speed a few times over the last couple days (~200GB @ ~75MB/s). Google probably flagged this as suspicious... Is there a way to throttle the download speed?

wkentaro commented 3 years ago

Is gdown --speed 10MB something you're looking for?

jungerm2 commented 3 years ago

Just following up. The issue mostly resolved itself, it was an issue on gdrive's side. I was able to download everything at full sped and without any restrictions when logged it through PyDrive. It seems these limits only occur when the file is public. Thanks for your help!

hypagedev commented 3 years ago

Hello, I used the next code and I hadn't any problem with my file, it has 29 GB import gdown import time url='https://drive.google.com/uc?id=XXXXXXXXXXXXXXX' output='velo.zip' time.sleep(100) gdown.download(url, output, quiet=False)

BigBarny commented 3 years ago

Following step 4 on the attached wiki: https://www.marstranslation.com/blog/how-to-skip-google-drive-virus-scan-warning-about-large-files

You can right+click -> inspect the 'Download anyway link' and get the URL to bypass the check.

In summary just substitute your file ID into this URL template: https://drive.google.com/u/0/uc?id={FILE_ID}

zeyuyun1 commented 3 years ago

Thanks for the detailed analysis. What I'm not sure is actually this has something to do with the parser. If I access https://drive.google.com/uc?id=1VeULcuxDUMMSg6TkgwBpiEaIjSykL1fO with logging in the Gdrive, it actually shows the download button:

image

However, if I access without logging in, it shows as below. So I still think this issue is not solvable without a login session. Any thoughts?

image

I think I have the same issue you described. The data I am trying to do download only show the download button when you log in. I am pretty sure the permission is everyone can see the file, but if you don't log in, you won't see the download button (due to high traffic?)

Did you find a solution for this? Is there a way to make gdown authenticate a google account?

This is data file (preprocessed wikipedia) that reproduced the problem

alexfilothodoros commented 3 years ago

Hello.

I have the same problem. I am logged in and everyone can see the file. I even see the download button. I have read that there is a maximum daily download limit.

theavicaster commented 3 years ago

I'm having the same problem. Worked for a few downloads. Changed access from public to private, with user being given access. Still won't work.

SuharshTyagii commented 3 years ago

same problem.

theavicaster commented 3 years ago

I tried to wget the same file and got this as a response

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator

If you're trying to download a large file, you might be facing the same.

youssefavx commented 3 years ago

Same error here in Colab downloading a 7 GB file multiple times (since colab times out, I had to do this), and got the same message as @theavicaster when I tried to use this function:

import requests

def download_file_from_google_drive(id, destination):
    def get_confirm_token(response):
        for key, value in response.cookies.items():
            if key.startswith('download_warning'):
                return value

        return None

    def save_response_content(response, destination):
        CHUNK_SIZE = 32768

        with open(destination, "wb") as f:
            for chunk in response.iter_content(CHUNK_SIZE):
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)

    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

I got this response:

Google Drive - Quota exceeded

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

Choapinus commented 3 years ago

Someone got a solution?, I got same error :'(

I just confirmed it, google drive was glitching out. I didn't change anything and it now works... I did however get the following error before it worked (I tried re-downloading it twice):

Access denied with the following error:

        Too many users have viewed or downloaded this file recently. Please
        try accessing the file again later. If the file you are trying to
        access is particularly large or is shared with many people, it may
        take up to 24 hours to be able to view or download the file. If you
        still can't access a file after 24 hours, contact your domain
        administrator.

No one (besides myself and the fine folks on this issue that tried to download my file) has been viewing/downloading this file. However, other files that are hosted on the same drive account have been downloaded at full speed a few times over the last couple days (~200GB @ ~75MB/s). Google probably flagged this as suspicious... Is there a way to throttle the download speed?

Darel13712 commented 2 years ago

Still has this issue with 4.0.2 =(

aminrezaee commented 2 years ago

same error . any fix ?

ewwnage commented 2 years ago

same error . any fix ?

WORKAROUND:

I added the desired file to my private GDrive and was then able to create a link that worked (make it editable by everyone)

abradley60 commented 2 years ago

Hi All,

I have found a solution that worked for me. I was trying to download a 112MB file from my gdrive which had the appropriate permissions.

image

Following this link gets me to a page like this

image

If I click on the download button I am taken to a security check

image

The link id at this check is the one you want to use with gdown. For example in colab, the following command:

!gdown --id 1iQhpdvoTyuvhxS9pvj4IwXZ-WE2V0dTv

This worked for me, hopefully it helps someone else.

Xiaojieqiu commented 2 years ago

this last solution is really my favorite, thank you so much!

gitlabspy commented 2 years ago

same error . any fix ?

WORKAROUND:

I added the desired file to my private GDrive and was then able to create a link that worked (make it editable by everyone)

No it does not work for me. I even upgraded my account for copying those big files to my private gdrive...

AleDella commented 2 years ago

I was trying to download a 2GB zip file from my personal google drive and gave me permission denied. I SOLVED the problem by using this command instead: sudo wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=FILEID" -O FILENAME && rm -rf /tmp/cookies.txt where you must substitute FILEID with your gdrive file ID and FILENAME with the name of the file you want to save.

Source: https://linux.tips/tutorials/download-large-google-drive-files-with-wget-in-terminal