Open alexklwong opened 4 years ago
The same error
same error
Maybe the same issue as https://github.com/wkentaro/gdown/issues/26.
I found this happens with large data + many access (e.g., public dataset) #42. However, I have no idea how to fix this.
If they are the same issue, there's no solution at the moment from gdown
side.
I have this issue as well with a 15GB dataset and a 4.3 GB dataset. But when I copy the 4.3GB to my local drive and manually get the link on my copied dataset it works. Can't make a copy of the 15GB on free drive though.
For reference the datasets im downloading are: https://drive.google.com/file/d/1NrqOLbMa_RwHbG3KIXJFWLnlND2kiIpj/view https://drive.google.com/file/d/15w0q3Nuye2ieu_aUNdTS_FNvoVzM4RMF/view
I was actually able to successfully download both datasets once from the original links but after needing to redo the downloads they no longer work even when uninstalling and reinstalling gdown
@Rmao99 Thanks for the information. That behavior makes sense if that's the issue of Google-Drive's side. It seems they restrict access from the command-line for data with many access (e.g., public dataset).
I've added a new feature to keep cookies https://github.com/wkentaro/gdown/pull/51. I hope this fixes this issue.
You can try pre-released version of this.
pip install -U --no-cache-dir gdown --pre
by default, cookies is enabled and saved into ~/.cache/gdown/cookies.json
.
You can disable it by gdown --no-cookies XXX
I have the same issue. I tried the above version, not fix.
I met the same error when I try to download many big files from my google drive, have anyone fixed it?
I find if I rm ~/.cache/gdown/cookies.json
and then restart my script, things start downloading again. Is your cookie expiring perhaps? I'm good for 1,000 or so files before I get the error.
Thanks for gdown. Very helpful!
@carlmalamud Thanks for info. Do you have any sharable links to reproduce this?
Unfortunately, there is nothing in ~/.cache/gdown/
On Wed, Jun 3, 2020 at 10:23 PM Carl Malamud notifications@github.com wrote:
I find if I rm ~/.cache/gdown/cookies.json and then restart my script, things start downloading again. Is your cookie expiring perhaps? I'm good for 1,000 or so files before I get the error.
Thanks for gdown. Very helpful!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wkentaro/gdown/issues/43#issuecomment-638232081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDECDYFMKEGB3BS3FBLVETRUZMGZANCNFSM4LTI65IQ .
-- Xiya Li State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China (Master)
@wkentaro What is happening is your cookie keeps growing and growing until it is full. I believe every time there is a warning, a new line gets added. I've seen my cookie grow from 11 warnings to 36 warnings in about 2 hours. After about 1,000 files, there are presumably enough that there isn't room for more warnings, so you when present the cookie that doesn't mention a particular download, Google rejects it and your code is thinking that it is a permission issue.
Perhaps if your cookie fails, you could remove it, try again, and only then if it fails a second time assume it is a permissions issue.
Here is a snippet of what my cookie looks like:
[ [ "AUTH_22nmp4ombsvnvmjrl3nb3u3hsrucc95e", "14651752893030988712Z|1591294950000|hospjjjbfh4rapv59tff247rc9fhcbr3" ], [ "NID", "204=x4h9LoiFMtrKRVT3aYaAXNxcePhv5SjsTw9qrioaMhrWD61BLoO_hbjosmcIBEYKSEf-pqz0U5D9SyUKwCiHJJ1Ys_xGBVg8oLqizpJaVYdWUoSIgSZKNL3xpkoGvIQs9lz8hTQD9EgvcwunJ--j5FtM7OPAn_W8mnVaHEVweSs" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQEMExkMlBpOUJoQ28", "IcVe" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQEMmVDYVJUckNQZlE", "Ricd" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQENGlEZDNabFRwVzg", "579P" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQENk9JWmNMNm81R0k", "o4Bf" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQENlVqTnJsc2R5dzA", "osl-" ], [ "download_warning_13058876669334088843_0B7JhzNLs-FQEOGpkZzFad25SdFE", "8XbE" ],
It looks like the magic number is 110 download_warnings and the cookie is full.
@carlmalamud thanks!
Sent a PR here: https://github.com/wkentaro/gdown/pull/55
Released as v3.11.1
@carlmalamud can you try if the new version eases the issue?
Same problem, upgraded to latest release and I get the following message:
Access denied with the following error:
Too many users have viewed or downloaded this file recently. Please
try accessing the file again later. If the file you are trying to
access is particularly large or is shared with many people, it may
take up to 24 hours to be able to view or download the file. If you
still can't access a file after 24 hours, contact your domain
administrator.
You may still be able to access the file from the browser:
%%%URLHERE
I was chugging along nicely for a while, but I now am also getting same error as @grudloff. I was in fact able to pull the file up in a browser and start the download as indicated above. I looked in the gdown cookie and didn't see any download_warning lines, so evidently we never got the cookie back from them with the additional authorization.
(I have now installed 3.11.1, but was running on 3.11.0 before, I simply inserted a "rm ~/.cache/gdown/cookies.json" every fifty lines. Sorry, it was working so I didn't want to bother it. :). The upgrade doesn't seem to help on the above error.
I am imagining that the error might have something to do with my being an authenticated user in the browser and not via gdown. I'm not sure is presenting an oauth token might make a difference here?
I'm not sure is presenting an oauth token might make a difference here?
I don't think so, gdown uses requests
inside without using any browsers.
Solved by disabling certificate verification by changing line 111 in download.py
res = sess.get(url, stream=True, verify=False)
However it does display this warning
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning
Solved by disabling certificate verification
In my case, it didn't work.
I encountered this issue downloading just 3 files from Gdrive, about 2.8 GB or so. The first two worked fine and the permission error popped up for the third file. After that all downloads stopped working.
@grudloff , Got same error here, did you solve it?
Is there any other platform that allows us to place some files in public?
The solution is to change the permission.
I had the same issue, I think this is somewhat related to the issue that Google Drive is probably rate-limiting the files based on the traffic.
I'm having this issue as well, but I do not think it's a permission problem. In my case, there's the "Download Anyway" button but it needs to be clicked twice. The first time, the URL changes (a confirm=....
code gets added), the second the download starts. This seems to be dealt with to some extent within the code:
However:
The function get_url_from_gdrive_confirmation
parses HTML line by line with regex. This is problematic for a ton of reasons like for instance if the HTML is malformed there might be a newline in the middle of an HTML tag, a regex match that is not the correct one might occur as earlier lines are given first dibs, and well, it's simply not possible to effectively parse HTML with regex.
The code above actually returns an empty string. This is because the regex substitution is done on the variable url
which will be the empty string (because that's what it's initialized to, and after it gets assigned to the function always returns).
Because of these two reasons, the URL that get_url_from_gdrive_confirmation
will return when a confirmation is needed could be either None (the function doesn't return anything because of case 1), or an empty string in case 2. This then throws the error we've been seeing:
My recommendation is to ditch the regex and use a dedicated HTML parser like lxml (either directly or through BeautifulSoup4). This would avoid future issues and be much more robust (although yes, it's an extra dependency). I wouldn't mind contributing to this but have little time at the moment. Hopefully, this helps!
For reference, the file I've had this issue with has an id of 1VeULcuxDUMMSg6TkgwBpiEaIjSykL1fO
. It is large but I've had success with much larger files so the sheer size shouldn't be an issue.
Thanks for the detailed analysis. What I'm not sure is actually this has something to do with the parser.
If I access https://drive.google.com/uc?id=1VeULcuxDUMMSg6TkgwBpiEaIjSykL1fO
with logging in the Gdrive, it actually shows the download button:
However, if I access without logging in, it shows as below. So I still think this issue is not solvable without a login session. Any thoughts?
I'm not seeing that behavior. When logged out I still get the download button as you can see here:
However, when clicking on it, it redirects to an HTTP 403 error... I just checked and the permissions are that anyone with the link can view. In fact, I applied these permissions to a folder full of files and gdown worked for all of them except this one. I've also applied these permissions directly to this file. Maybe this is an error on google's part? That's unlikely.
I just confirmed it, google drive was glitching out. I didn't change anything and it now works... I did however get the following error before it worked (I tried re-downloading it twice):
Access denied with the following error:
Too many users have viewed or downloaded this file recently. Please
try accessing the file again later. If the file you are trying to
access is particularly large or is shared with many people, it may
take up to 24 hours to be able to view or download the file. If you
still can't access a file after 24 hours, contact your domain
administrator.
No one (besides myself and the fine folks on this issue that tried to download my file) has been viewing/downloading this file. However, other files that are hosted on the same drive account have been downloaded at full speed a few times over the last couple days (~200GB @ ~75MB/s). Google probably flagged this as suspicious... Is there a way to throttle the download speed?
Is gdown --speed 10MB
something you're looking for?
Just following up. The issue mostly resolved itself, it was an issue on gdrive's side. I was able to download everything at full sped and without any restrictions when logged it through PyDrive. It seems these limits only occur when the file is public. Thanks for your help!
Hello, I used the next code and I hadn't any problem with my file, it has 29 GB import gdown import time url='https://drive.google.com/uc?id=XXXXXXXXXXXXXXX' output='velo.zip' time.sleep(100) gdown.download(url, output, quiet=False)
Following step 4 on the attached wiki: https://www.marstranslation.com/blog/how-to-skip-google-drive-virus-scan-warning-about-large-files
You can right+click -> inspect the 'Download anyway link' and get the URL to bypass the check.
In summary just substitute your file ID into this URL template: https://drive.google.com/u/0/uc?id={FILE_ID}
Thanks for the detailed analysis. What I'm not sure is actually this has something to do with the parser. If I access
https://drive.google.com/uc?id=1VeULcuxDUMMSg6TkgwBpiEaIjSykL1fO
with logging in the Gdrive, it actually shows the download button:However, if I access without logging in, it shows as below. So I still think this issue is not solvable without a login session. Any thoughts?
I think I have the same issue you described. The data I am trying to do download only show the download button when you log in. I am pretty sure the permission is everyone can see the file, but if you don't log in, you won't see the download button (due to high traffic?)
Did you find a solution for this? Is there a way to make gdown authenticate a google account?
This is data file (preprocessed wikipedia) that reproduced the problem
Hello.
I have the same problem. I am logged in and everyone can see the file. I even see the download button. I have read that there is a maximum daily download limit.
I'm having the same problem. Worked for a few downloads. Changed access from public to private, with user being given access. Still won't work.
same problem.
I tried to wget the same file and got this as a response
Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator
If you're trying to download a large file, you might be facing the same.
Same error here in Colab downloading a 7 GB file multiple times (since colab times out, I had to do this), and got the same message as @theavicaster when I tried to use this function:
import requests
def download_file_from_google_drive(id, destination):
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
I got this response:
Google Drive - Quota exceeded
Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.
Someone got a solution?, I got same error :'(
I just confirmed it, google drive was glitching out. I didn't change anything and it now works... I did however get the following error before it worked (I tried re-downloading it twice):
Access denied with the following error: Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.
No one (besides myself and the fine folks on this issue that tried to download my file) has been viewing/downloading this file. However, other files that are hosted on the same drive account have been downloaded at full speed a few times over the last couple days (~200GB @ ~75MB/s). Google probably flagged this as suspicious... Is there a way to throttle the download speed?
Still has this issue with 4.0.2
=(
same error . any fix ?
same error . any fix ?
WORKAROUND:
I added the desired file to my private GDrive and was then able to create a link that worked (make it editable by everyone)
Hi All,
I have found a solution that worked for me. I was trying to download a 112MB file from my gdrive which had the appropriate permissions.
Following this link gets me to a page like this
If I click on the download button I am taken to a security check
The link id at this check is the one you want to use with gdown. For example in colab, the following command:
!gdown --id 1iQhpdvoTyuvhxS9pvj4IwXZ-WE2V0dTv
This worked for me, hopefully it helps someone else.
this last solution is really my favorite, thank you so much!
same error . any fix ?
WORKAROUND:
I added the desired file to my private GDrive and was then able to create a link that worked (make it editable by everyone)
No it does not work for me. I even upgraded my account for copying those big files to my private gdrive...
I was trying to download a 2GB zip file from my personal google drive and gave me permission denied. I SOLVED the problem by using this command instead:
sudo wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=FILEID" -O FILENAME && rm -rf /tmp/cookies.txt
where you must substitute FILEID with your gdrive file ID and FILENAME with the name of the file you want to save.
Source: https://linux.tips/tutorials/download-large-google-drive-files-with-wget-in-terminal
I want to download a 16GB zip file on Google Drive using:
gdown https://drive.google.com/uc?id=1InfIal4y7OBMGNUDeldEmDxtD0MrewY8
I have already set the permissions on the file to ``Anyone with the link''
But I get this error:
Permission denied: https://drive.google.com/uc?id=1InfIal4y7OBMGNUDeldEmDxtD0MrewY8 Maybe you need to change permission over 'Anyone with the link'?