tidyverse / googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
https://googlesheets4.tidyverse.org
Other
357 stars 51 forks source link

Running into quota issues #210

Closed nikosbosse closed 3 years ago

nikosbosse commented 3 years ago

We are currently using google sheets for a research project on crowd forecasts for Covid-19 case and death numbers. We are often running into quota limit issues - even though the number of users we have should be well below what Google allows.

I'm cross-posting this from stackoverflow - please feel free to close this issue if you feel it doesn't belong here.

I attempted to create a somewhat reproducible example by setting up a new google account and creating a sheet from which to read.

The first thing I tried (without making any changes to the google account) is this:

library(googledrive)
library(googlesheets4)

# Google sheets authentification -----------------------------------------------
options(gargle_oauth_cache = ".secrets")
drive_auth(cache = ".secrets", email = "iamatestotest@gmail.com")
gs4_auth(token = drive_token())

sheet_id <- "1Z2O5Mce_haceWfduLenJQP-hddXF9biY_4Ydob_psyQ"

n_tries <- 50

for (i in 1:n_tries) {
  data <- read_sheet(ss = sheet_id)
  Sys.sleep(0.5)
  print(i)
}

From what I understand I should be able to make around 300 read requests per minute, but I'm usually not be able to get the loop to run beyond 30-34. I then also tried to change the settings on https://console.cloud.google.com, but that was also unsuccessful.

I suspect it might also be an issue with us not using the google API correctly, but I'm unsure whether this bevavior is expected or something odd is going on.

Thank you very much!

jennybc commented 3 years ago

If you are using the built-in OAuth app, you are actually sharing quota with all the other users of googlesheets4 who are doing same.

https://developers.google.com/sheets/api/limits

Specifically, the problem is usually this one:

500 requests per 100 seconds per project

The "project" here = "everyone who is using googlesheets4 worldwide and isn't using their own OAuth app".

You can install dev gargle, which has built-in retries:

devtools::install_github("r-lib/gargle")

This will help your code to work, but many of the backoffs will be quite long.

You should probably get your own OAuth app and not share quota with anyone else.

https://gargle.r-lib.org/articles/get-api-credentials.html#oauth-client-id-and-secret-1

nikosbosse commented 3 years ago

Thank you very much for your quick and helpful response! I followed the instructions to set up my own OAuth app and a service account.

I tried the service-account by calling gs4_auth(path = "my-service-account-token.json") and also tried using the OAuth client by calling gs4_auth_configure(path = "my-oauth-client.json"). That kind of worked - I'm seeing API requests in my console.cloud.google.com dashboard, so I assume it's nearly there. These requests, however, are failing and I wasn't able to figure out why from the documentation (sorry if I just missed the right part to read!).

With both methods I directly get an error in R that says

Client error: (403) PERMISSION_DENIED
  * Client does not have sufficient permission. This can happen because the OAuth token does not have the right scopes, the client doesn't have permission, or the API has not been enabled for the client project.
  * The caller does not have permission

The API is enabled, so I assume my OAuth configuration is not correct. This is how I configured it: image The OAuth app is not published or verified (I was hoping to get it running with the test user, but not sure that is possible) Thank you very much for your help!

cpilat97 commented 3 years ago

Hi there, I predominantly use the service account key & the corresponding .json file when I use googlesheets4 (which has been a lot lately). Any time i've received that error message it's because I haven't yet given edit permissions on the google sheet to the @my-project.iam.gserviceaccount.com email. (see screenshot for which email i'm referring too)

Screen Shot 2021-02-23 at 7 28 52 PM

So my workflow in my R script is: gs4_auth(path = "my_json_file.json")

and then i go to the google sheet -> share -> put in the service account email with edit access.

Have you tried that out and seeing if it works?

edit: I think it was (now closed) issue 189 that brings up giving the service permission on the relevant sheets

nikosbosse commented 3 years ago

Nice! that (mostly) worked. After I gave permissions to the sheet gs4_auth(path = "my-service-account-token.json") works. As an icing on the cake I can now also use gs4_auth_configure(path = "my-oauth-client.json") and then the google asks me to grant permissions to my OAuth instead of the tidyverse OAuth. Sweet!

However, I'm still running into quota issues..

Regardless of whether I use the service account token or the OAuth app, the above loop still crashes after around 30-35 cycles. I verified the project number is indeed my project number and I can see the requests in the API Dashboard.

I assume this might have to do with the 60 requests per user per minute limit (instead of 300 per minute overall), although I'm confused I get the error before 60. Does read_sheet(ss = sheet_id) fire more than one request? If I created more than one service account, would that then count as two users?

cpilat97 commented 3 years ago

Are you running the CRAN version of googlesheets4 or the dev version (or as Jenny said earlier the dev version of gargle)?

I copied and pasted your code and tried on a sheet I have in my drive, but i'm also using the github version of the package because I was running into server side errors at one point and was hoping that would help (this was from a script where i was reading info from a qualtrics survey, amending it, and then writing it back to a google sheet that others interacted with/used as a tracker. But i kept it with just the reading of a single sheet)

Screen Shot 2021-02-24 at 3 43 20 PM

I was able to get through all 50 tries but there was a failure around the 35 mark and there were two retries

Screen Shot 2021-02-24 at 3 43 28 PM

I don't think i changed anything specifically on the google cloud end, just left all the defaults when setting up the service account and api. So i'm assuming my options/quotas are set to be the same as yours (unless you changed them?).

I'm not sure if creating a secondary service account would help much? Since you'd have to authorize the .json file in the script and then gs4 would take whichever is last (that's my assumption at least).

Quick edit --- I changed the Sys.sleep from Sys.sleep(.5) to Sys.sleep(1) and didn't encounter any errors or retries. Not sure how important the .5 second interval is for you but might be worth a shot if you can't use the dev version of gargle for now?

nikosbosse commented 3 years ago

I had been using the cran versions so far, but I agree it is a good idea to switch to the dev packages. Interesting you also saw a retry around 30... I'll try it out and report back!

My real world application for this is within in a shiny app. My thought was that I could, for example, supply different service account tokens to the app. Then the shiny app could choose one at random. That would (at least probabilistically) distribute the load across different users (if that makes a difference at all)

nikosbosse commented 3 years ago

Based on your suggestions I posted an answer / short explainer to my original question on stackoverflow that people may find helpful: https://stackoverflow.com/questions/66186332/googlesheets-quota-limit-issues-possible-failure-to-use-api-key/66387782#66387782. Let me know if it would be helpful for you if I wrote that down in more detail and contributed it to documentation. Thank you very much for your kind help! I'm closing this issue for now, because I believe that my problem is solved now for most practical applications. I still find it intriguing that @cpilat97 and me sometimes ran in to issues around the 30 request mark - but I suspect that has more to do with Google than with the googlesheets4 package.

jennybc commented 3 years ago

Glad things seem to be sorted out!

I have a minor comment on this:

options(gargle_oauth_cache = ".secrets")
drive_auth(cache = ".secrets", email = "iamatestotest@gmail.com")

Here's the signature of drive_auth():

drive_auth <- function(email = gargle::gargle_oauth_email(),
                       path = NULL,
                       scopes = "https://www.googleapis.com/auth/drive",
                       cache = gargle::gargle_oauth_cache(),                      # <-- LOOK AT THIS
                       use_oob = gargle::gargle_oob_default(),
                       token = NULL) { ... }

So once you've set the option that controls gargle's cache location, you're good. I recommend you use the option or the function argument, but not both. I'd probably use the option.

I see a lot of baroque auth code, because I think people just try all sorts of things, turning various dials and knobs, and then are so relieved when it works that they just leave the code as it is. But eventually this saddles you with code you're afraid to touch, because you're not sure why it works.

jennybc commented 3 years ago

More comments based on reading everything above:

jennybc commented 3 years ago

Does read_sheet(ss = sheet_id) fire more than one request?

And, yes, each high-level googlesheets4 call will potentially make multiple requests, because we often need to get metadata about the Sheet before we can fully craft the main request. I try to batch as much as I can, but some "extra" requests are unavoidable.

nikosbosse commented 3 years ago

Thank you very much, Jenny, for your kindness and patience with this! I assume a lot of people must have a lot of questions all the time, and I feel your support really is extraordinary and is greatly appreciated!

So once you've set the option that controls gargle's cache location, you're good. I recommend you use the option or the function argument, but not both. I'd probably use the option.

makes total sense. Copied that code from somewhere and didn't look into it more closely. I can edit it in the first message and SO question if you like.

All of this is really neat and in general working very well. Luckily I don't have to make crazy requests in a for loop - that was just to make sure I can reproduce the issue. Excited about the features to come!

jennybc commented 3 years ago

I have since looked at the analytics for the built-in oauth app and someone is clearly overusing it and, specifically, using up all/most of the quota, leading to high error rates for everyone.

This is a downside of a built-in app, i.e. I can't curb inappropriate use of it.