tidyverse / googlesheets4

Google Spreadsheets R API (reboot of the googlesheets package)
https://googlesheets4.tidyverse.org
Other
360 stars 53 forks source link

Service Account Token works in RStudio but not when running via Terminal #224

Closed cpilat97 closed 3 years ago

cpilat97 commented 3 years ago

Hi there,

First off, just want to thank you for all of your efforts on this package, i use it quite a bit and it is a priceless resource for the teams I support.

I'm not sure if i've found a bug or it's possible I didn't authorize correctly. I have a workflow in-which I read in a google sheet from one team i support (updates are made daily to it), perform various data cleaning, etc..., split the sheet out by an individual's name, and write those back to a separate google drive folder for another team to utilize. Finally, if a given set of circumstances/changes are made the script sends an email via the gmailr package to certain users (i'm using as many google packages as i can in this one haha). Ultimately, I wanted to schedule this via CRON so I don't have to manually fire the script several times a day.

I have a service account token set up via my org's cloud platform and have the token in my project (i know this might not be the best place to keep it).

After i load in my libraries, the first few lines are as follows:

library(tidyverse)
library(here)
library(googlesheets4)
library(googledrive)
library(gmailr)

drive_auth(path = here("service_account_token_key.json"))
gs4_auth(path = here("service_account_token_key.json")) #note: i didn't think i needed to auth the gs4 after drive, but i was getting an issue if i didn't have this auth in here
gm_auth_configure(path = here("gmail_auth_sat_key.json"))
gm_auth(email = "myemail", cache = ".secret")

#rest of script reading in data, etcc....

The above works totally fine if i'm doing things manually or sourcing the script within RStudio. And when i look at the google drive folder or the google sheet i see that the service account email is making edits (not my personal account email). And the service account email has access/auth to my drive folder and the sheets it pulls from.

When i go to run the script through my terminal window or via CRON though, i receive the following error:

> Using an auto-discovered, cached token
  To suppress this message, modify your code or options to clearly consent to
  the use of a cached token
  See gargle's "Non-interactive auth" vignette for more details:
  <https://gargle.r-lib.org/articles/non-interactive-auth.html>
> The googledrive package is using a cached token for 'myemail@email.com'
Error: lexical error: invalid char in json text.
                                      /Users/me/client_secret
                     (right here) ------^
Execution halted

Wouldn't the drive_auth/gs4_auth pull from the directory it's located in regardless if it's run though terminal/CRON or RStudio?

My GoogleSheets4 package is 0.3.0.9 Drive is 1.9.0.9 Gargle is 1.1.0

jennybc commented 3 years ago

Anytime I hear "cron job" and "auth not working", I suspect a path problem. You need to insert some assertions about file existence, print statements of getwd() and list.files(), etc. into your cron job to convince yourself 110% that you understand what the working directory is when those calls like drive_auth() are made.

Re this:

drive_auth(path = here("service_account_token_key.json")) gs4_auth(path = here("service_account_token_key.json")) #note: i didn't think i needed to auth the gs4 after drive, but i was getting an issue if i didn't have this auth in here

Yeah, for now, googledrive and googlesheets4 don't know about each other auth-wise. Long-term, it would be nice to be a bit more clever, but that's not in place now. Here's the best advice on using them together:

https://googlesheets4.tidyverse.org/articles/articles/drive-and-sheets.html

cpilat97 commented 3 years ago

Ah, I see. I've been trying to stay away from declaring absolute paths for when my colleagues need to use or run anything I might write, but that makes complete sense. I've had a love/hate relationship with cron scheduling lately because of the usefulness when it works but trouble it causes when it doesn't.

Thanks for your quick reply on this!! I'm going to close out the issue.

And thanks for the advice with drive and gs4 auths and how using them together. For how immensely useful these packages have been, I definitely don't mind the two different auth calls.

Thanks again!

cpilat97 commented 3 years ago

Just following up here, i was able to amend my script with absolute paths (i.e. drive_auth(path = "User/pilat/......"), marking the beginning of the doc explicitly calling and setting the working directory and all seems to work without an issue. Thank you again for all of your efforts on these packages!

jennybc commented 3 years ago

You should probably look into using Application Default Credentials, which is really a search strategy for locating service account tokens. If you put your token in an expected place (or announce its location via an env var), you can get these yucky absolute paths out of your R scripts.

https://gargle.r-lib.org/reference/credentials_app_default.html

ADC is the first auth method automatically tried by googlesheets4 (gargle, really). So once you have this configured correctly, you don't need any explicit auth code, i.e. no drive_auth() calls.

This, of course, does not address any other problems caused by working directory dysfunction when using cron. But it solves the auth problem.

cpilat97 commented 3 years ago

Thank you!! That was actually quite easy to set up and everything seems to be working (of course I'm sure i'll break something else tinkering with scripts haha). Appreciate your help and also highlighting the issues that can happen with cron; will definitely dig deeper into cron next time around if other type of issues pop up.

cpilat97 commented 3 years ago

just following up here, something i think i incorrectly did on my end when setting up the service account (I did the one for googlesheets4 & drive before i did anything with the gmailr setup).

When i set the application default credentials to the .json file i had for the service account, it caused the gmailr package to throw an api error: 400 (told me the precondition check failed) even when i explicitly called the gmail_auth_configure(path = ""), gmail_auth(email = "", cache = ".secret"). This occurred whether running in Terminal, or going line by line in RStudio. What was interesting though was when i removed that credentials_app_default, gmailr went back to working just fine and sending email.

The service account i have set up for googlesheets4 has domain-wide delegation, but i'm guessing it's also because i created separate keys at different times for the gmailr package and the googlesheets/drive pacakges.

jennybc commented 3 years ago

A couple of things come to mind (if we go on, we should probably move this to a gargle or gmailr issue):

Have a look at how a service account needs to be set up w.r.t domain-wide delegation for Gmail specifically: https://stackoverflow.com/questions/29327846/gmail-rest-api-400-bad-request-failed-precondition

cpilat97 commented 3 years ago

Ah, sorry, I made some mistakes in my post before and just went to double-check things in my cloud console.

The gmailr JSON file refers to the OAuth app i set up for that, so when i call gmailr::gm_auth_configure(path = "path/to/auth.json" and then gm_auth(email = "me@mygmail.com", cache = ".secret") it does send the email correctly.

So the beginning of the script goes as follows:

googledrive::drive_auth(path = "path/to/sheets_drive_key.json")
googlesheets4::gs4_auth("path = "path/to/sheets_drive_key.json") #later updated to token = drive_token()
gm_auth_configure(path = "path/To/gmail/client_secret_gmail_oauth_app.json") #gmail auth
gm_auth(email = "myemail", cache = ".secret")

Interestingly the gmail was sending with cron scheduling before the credentials_update and after i removed it.

Sorry i bungled my last post describing it. I'll definitely look into this more regarding the link you put in and dive more into gargle. If this seems to persist even after i try some changes on the cloud console end i'll move over to gargle or gmailr.

As a side note, and continued thanks for your development and work on these packages, I work for the nyc department of education and these packages (along with a host of other r packages) have been immensely helpful in ensuring tens of thousands of families (and 100k+ 3 & 4y.o. children) have access to free, high quality, and universal 3K and PreK education. Thanks again for your work on these packages :)

jennybc commented 3 years ago

Thanks for the thanks! It is nice to hear how people use these tools!

With lots of credentials-related things being stored as JSON, it is very easy to confuse one for the other (an OAuth app vs. a service account token, for example), so I would stay on high alert for that sort of thing and, as you seem to be doing, give things super-informative, accurate names. Sometimes these files (must) have terrible names, but then you can, at least, create a super-informative, accurate symlink.

One last piece of naming advice: an API key is a Real Thing and, therefore I think path/to/sheets_drive_key.json is a bad name, because you actually mean path/to/sheets_drive_service_account_token.json.