ratt-ru / packratt

BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

GoogleDrive API v3 usage limit #19

Open kwazzi-jack opened 3 years ago

kwazzi-jack commented 3 years ago

I came across this the following error after roughly testing if packratt could pull all my measurement sets from my google drive folder. There were 48 in total so I ran 48 separate packratt get methods in a python3 script:

packratt.get(f'/MSC_DATA/MS/KAT7_{NTIME}_7_{NCHAN}.tar.gz', 'datasets/ms/')

where NTIME and NCHAN are indicators for the specific MS. Nearing the end of the downloads, packratt stopped working due to google throwing a request error code 403 for the download url:

Traceback (most recent call last):
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/downloads.py", line 91, in requests_partial_download
    response.raise_for_status()
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://doc-0c-74-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/n46bea6cf1caaduraqda42ptdlcbmo96/1613487750000/00827075692661388493/*/1rwrycA2AQ6jPZU15JZYHikIfGdqRs3i7?e=download

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 80, in <module>
    packratt.get(f'/MSC_DATA/MS/KAT7_{t}_7_{c}.tar.gz', 'datasets/ms/')
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/interface.py", line 58, in get
    sha256_hash = downloaders(entry['type'], key, entry)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/dispatch.py", line 19, in __call__
    return fn(*args, **kwargs)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/downloads.py", line 124, in download_google_drive
    params=params)
  File "/home/brian/Code/brianWelman/final/lib/python3.6/site-packages/packratt/downloads.py", line 93, in requests_partial_download
    raise Exception(e)
Exception: 403 Client Error: Forbidden for url: https://doc-0c-74-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/n46bea6cf1caaduraqda42ptdlcbmo96/1613487750000/00827075692661388493/*/1rwrycA2AQ6jPZU15JZYHikIfGdqRs3i7?e=download

Which as I see now is api usage limit reached error. See the following article for more. This is based off of my google drive api I have enabled and to increase the quota, I would have to add a super admin account (which I do not know how to do) to manually change those usage limits. Otherwise, I have to wait for the quota to reset. No packratt get command will work with this specific google drive folder until it does.

Anyway around this? It is a slight problem when I try to pull multiple MS to perform tests. Otherwise, I will attempt to move to Rhodes servers instead.

sjperkins commented 3 years ago

Thanks for reporting this @brianwelman2, I think you're the first person trying to set up this quantity of downloads, so it's not surprising that you're testing the limits.

Which as I see now is api usage limit reached error. See the following article for more. This is based off of my google drive api I have enabled and to increase the quota, I would have to add a super admin account (which I do not know how to do) to manually change those usage limits. Otherwise, I have to wait for the quota to reset. No packratt get command will work with this specific google drive folder until it does.

I think what we're seeing here are the limitations of Google Drive as it's not really a content delivery network (CDN) as suggested by this stackoverflow answer. I wasn't aware of these limits so thanks.

Anyway around this? It is a slight problem when I try to pull multiple MS to perform tests.

I guess that it's complaining about many small requests. Perhaps you could Tar them all into one big file for download?

Otherwise, I will attempt to move to Rhodes servers instead.

This may be the short term answer. Otherwise, S3 buckets or some other dedicated CDN service may be the way forward.

kwazzi-jack commented 3 years ago

Okay, thank you for the clarification. I realised I would be pushing it for this kind of use, but with the cache in use with packratt, it can be avoided. I will have a look at the suggested CDN services, but for now it works perfectly fine for me.

On Wed, 17 Feb 2021, 09:20 Simon Perkins, notifications@github.com wrote:

Thanks for reporting this @brianwelman2 https://github.com/brianwelman2, I think you're the first person trying to set up this quantity of downloads, so it's not surprising that you're testing the limits.

Which as I see now is api usage limit reached error. See the following article for more. This is based off of my google drive api I have enabled and to increase the quota, I would have to add a super admin account (which I do not know how to do) to manually change those usage limits. Otherwise, I have to wait for the quota to reset. No packratt get command will work with this specific google drive folder until it does.

I think what we're seeing here are the limitations of Google Drive as it's not really a content delivery network (CDN) as suggested by this stackoverflow answer https://stackoverflow.com/a/10313416. I wasn't aware of these limits so thanks.

Anyway around this? It is a slight problem when I try to pull multiple MS to perform tests.

I guess that it's complaining about many small requests. Perhaps you could Tar them all into one big file for download?

Otherwise, I will attempt to move to Rhodes servers instead.

This may be the short term answer. Otherwise, S3 buckets or some other dedicated CDN service may be the way forward.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ratt-ru/packratt/issues/19#issuecomment-780359952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5OS5DS5G4PE2XFDD6KFNLS7NU3RANCNFSM4XWU2YAA .