tidyverse / googledrive

Google Drive R API
https://googledrive.tidyverse.org/
Other
322 stars 47 forks source link

Error in `googledrive::drive_auth()`: scheduled jobs accessing googlesheets are failing on Rstudio-server #425

Closed naveenbussari closed 1 year ago

naveenbussari commented 1 year ago

Hi @jennybc , continuation from #424

I am using rstudio-server, community version, installed on ubuntu 20.04. I am running an R script which reads an excel file. I have done the authentication and tokens are generated. I am using googledrive and googlesheets4 libraries.

`library(gargle)
library(cronR)
library(googlesheets4)
library(googledrive)

options(gargle_oath_cache = "~/.cache/gargle",
        gargle_oauth_email = "bussari.xxxxxx.com",
        use_oob = TRUE)

googledrive::drive_auth(email = "bussari.xxxxxx.com",
                        cache = "~/.cache/gargle",
                        use_oob = TRUE)

googlesheets4::gs4_auth(email = "bussari.xxxxxx.com",
                        cache = "~/.cache/gargle",
                        use_oob = TRUE)

x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')

x
`

We installed the latest version of gargle(1.4.0). This time , jobs that access googlesheets are running fine from console/terminal but failing when we schedule them from rstudio-server

image image

logs of the scheduled job

`Attaching package: ‘googlesheets4’
The following object is masked from ‘package:gargle’:
    request_make
Attaching package: ‘googledrive’
The following objects are masked from ‘package:googlesheets4’:
    request_generate, request_make
The following object is masked from ‘package:gargle’:
    request_make
Error in `googledrive::drive_auth()`:
! Can't get Google credentials.
ℹ Are you running googledrive in a non-interactive session? Consider:
• Call `drive_deauth()` to prevent the attempt to get credentials.
• Call `drive_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
Backtrace:
    ▆
 1. └─googledrive::drive_auth(...)
 2.   └─googledrive:::drive_abort(...)
 3.     └─cli::cli_abort(message = message, ..., .envir = .envir)
 4.       └─rlang::abort(...)
Execution halted
`

Thanks in advance.

jennybc commented 1 year ago
options(gargle_oath_cache = "~/.cache/gargle",
        gargle_oauth_email = "bussari.xxxxxx.com",
        use_oob = TRUE)

Looks like there's a typo in the option name.

BAD, NO: gargle_oath_cache GOOD, YES: gargle_oauth_cache

More observations:

I suspect the code should look more like this (untested, obviously):

library(googlesheets4)
library(googledrive)

options(gargle_oath_cache = "~/.cache/gargle", gargle_oauth_email = "bussari.xxxxxx.com")

x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')
naveenbussari commented 1 year ago
`
library(googlesheets4)
library(googledrive)

options(gargle_oauth_cache = "~/.cache/gargle",gargle_oauth_email = "bussari.xxxxxx.com")

x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')

x
`

When I run the above code console/terminal it works fine. But when I schedule it, the logs are:

`
Attaching package: ‘googledrive’
The following objects are masked from ‘package:googlesheets4’:
    request_generate, request_make
Error in `gs4_auth()`:
! Can't get Google credentials.
ℹ Are you running googlesheets4 in a non-interactive session? Consider:
• Call `gs4_deauth()` to prevent the attempt to get credentials.
• Call `gs4_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
Backtrace:
     ▆
  1. └─googlesheets4::read_sheet("https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0")
  2.   └─googlesheets4:::get_cells(...)
  3.     └─googlesheets4::gs4_get(ssid)
  4.       └─googlesheets4:::gs4_get_impl_(as_sheets_id(ss))
  5.         └─googlesheets4::request_generate(...)
  6.           ├─gargle::request_build(...)
  7.           └─googlesheets4::gs4_token()
  8.             └─googlesheets4::gs4_auth()
  9.               └─googlesheets4:::gs4_abort(...)
 10.                 └─cli::cli_abort(...)
 11.                   └─rlang::abort(...)
Execution halted
`

if I schedule my previous code

`
library(gargle)
library(cronR)
library(googlesheets4)
library(googledrive)

options(gargle_oauth_cache = "~/.cache/gargle",gargle_oauth_email = "bussari.xxxxxx.com")

drive_auth(email = "bussari.xxxxxx.com",cache = "~/.cache/gargle")
gs4_auth(email = "bussari.xxxxxx.com",cache = "~/.cache/gargle")

x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')

x

`

the logs are

`
Attaching package: ‘googlesheets4’
The following object is masked from ‘package:gargle’:
    request_make
Attaching package: ‘googledrive’
The following objects are masked from ‘package:googlesheets4’:
    request_generate, request_make
The following object is masked from ‘package:gargle’:
    request_make
Error in `drive_auth()`:
! Can't get Google credentials.
ℹ Are you running googledrive in a non-interactive session? Consider:
• Call `drive_deauth()` to prevent the attempt to get credentials.
• Call `drive_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
Backtrace:
    ▆
 1. └─googledrive::drive_auth(email = "bussari.xxxxxx.com", cache = "~/.cache/gargle")
 2.   └─googledrive:::drive_abort(...)
 3.     └─cli::cli_abort(message = message, ..., .envir = .envir)
 4.       └─rlang::abort(...)
Execution halted
`

I have another doubt. How was it able to run in console/terminal even with incorrect spelling of function name?

jennybc commented 1 year ago

How was it able to run in console/terminal even with incorrect spelling of function name?

It was a misspelled option name, not function name. You can set any options you want, but if the name is misspelled, it just won't have the effect you intend.

I think you should add some print statements to the script to confirm the existence of the cache path and list its contents. Is the cron job's idea of ~/ (home directory) the same as yours?

naveenbussari commented 1 year ago

This time I tried giving full path to gargle cache and also few print statements along with some info.

code:

`
library(gargle)
library(cronR)
library(googlesheets4)
library(googledrive)

gargle::gargle_oauth_sitrep()
drive_user()
gargle::gargle_oauth_cache()
drive_auth()
print("current working directory is " )
getwd()
print("before options gargle")
options(gargle_oauth_cache = "/home/bussari.naveen/.cache/gargle",gargle_oauth_email = "bussari.xxxxxx.com")

list.files("~/.cache/gargle/")
print("before drive_oauth")
drive_auth(email = "bussari.xxxxxx.com",cache = "/home/bussari.naveen/.cache/gargle")
print("before drive_oauth")
gs4_auth(email = "bussari.xxxxxx.com",cache = "/home/bussari.naveen/.cache/gargle")

x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')

x

`

logs when run in console:( this time it threw some warning kind of stufff. Hopefully this leads to something)

`
> library(gargle)
> library(cronR)
> library(googlesheets4)
> library(googledrive)
> 
> gargle::gargle_oauth_sitrep()
ℹ Taking cache location from the `"gargle_oauth_cache"` option.
2 tokens found in this gargle OAuth cache:
~/.cache/gargle

email                          app             scopes          hash...   
______________________________ _______________ _______________ __________
bussari.xxxxxx.com tidyverse-erato ...spreadsheets 54a075e...
bussari.xxxxxx.com tidyverse-erato ...drive        928dfaa...
> drive_user()
attempt to access internal gargle data from: googledrive
Logged in as:
• displayName: Bussari Naveen
• emailAddress: bussari.xxxxxx.com
> gargle::gargle_oauth_cache()
[1] "~/.cache/gargle"
> drive_auth()
> print("current working directory is " )
[1] "current working directory is "
> getwd()
[1] "/home/bussari.naveen"
> print("before options gargle")
[1] "before options gargle"
> options(gargle_oauth_cache = "/home/bussari.naveen/.cache/gargle",gargle_oauth_email = "bussari.xxxxxx.com")
> 
> list.files("~/.cache/gargle/")
[1] "54a075ef043e32727d17eb41a3d80974_bussari.xxxxxx.com"
[2] "928dfaa8a7345b1e2252d3bb85b912c6_bussari.xxxxxx.com"
> print("before drive_oauth")
[1] "before drive_oauth"
> drive_auth(email = "bussari.xxxxxx.com",cache = "/home/bussari.naveen/.cache/gargle")
> print("before drive_oauth")
[1] "before drive_oauth"
> gs4_auth(email = "bussari.xxxxxx.com",cache = "/home/bussari.naveen/.cache/gargle")
trying `token_fetch()`
Trying `credentials_byo_oauth()` ...
Error caught by `token_fetch()`:
inherits(token, "Token2.0") is not TRUE
trying `credentials_service_account()`
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_external_account()`
adding "userinfo.email" scope
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_app_default()`
Trying `credentials_gce()` ...
✖ We don't seem to be on GCE.
trying `credentials_user_oauth2()`
attempt to access internal gargle data from: googlesheets4
Gargle2.0 initialize
adding "userinfo.email" scope
loading token from the cache
matching token found in the cache
> 
> x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')
attempt to access internal gargle data from: googlesheets4
✔ Reading from rstudio_test.
✔ Range Sheet1.
attempt to access internal gargle data from: googlesheets4
> 
> x
# A tibble: 2 × 2
  Name      ID
  <chr>  <dbl>
1 Vinay      1
2 Naveen     2
`

logs when scheduled the job, the job failed at drive_auth()

`
The following object is masked from ‘package:gargle’:
    request_make
Attaching package: ‘googledrive’
The following objects are masked from ‘package:googlesheets4’:
    request_generate, request_make
The following object is masked from ‘package:gargle’:
    request_make
2 tokens found in this gargle OAuth cache:
'~/.cache/gargle'

email                          app             scopes          hash...   
______________________________ _______________ _______________ __________
bussari.xxxxxx.com tidyverse-erato ...spreadsheets 54a075e...
bussari.xxxxxx.com tidyverse-erato ...drive        928dfaa...
ℹ Not logged in as any specific Google user.
[1] NA
Error in `drive_auth()`:
! Can't get Google credentials.
ℹ Are you running googledrive in a non-interactive session? Consider:
• Call `drive_deauth()` to prevent the attempt to get credentials.
• Call `drive_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
Backtrace:
    ▆
 1. └─googledrive::drive_auth()
 2.   └─googledrive:::drive_abort(...)
 3.     └─cli::cli_abort(message = message, ..., .envir = .envir)
 4.       └─rlang::abort(...)
Execution halted
`
jennybc commented 1 year ago

Assuming you're running the same code in both scenarios, we're getting a lot less information from the cron job, which is where we really need the info. How are you capturing this? Feels like a standard out vs standard error problem (re: this missing info).

There's still a lot extraneous code, along the lines of what I pointed out before (https://github.com/tidyverse/googledrive/issues/425#issuecomment-1513912365), which just adds noise and is making this harder.

This is still my leading hypothesis:

I think you should add some print statements to the script to confirm the existence of the cache path and list its contents. Is the cron job's idea of ~/ (home directory) the same as yours?

I see you changed to an absolute path, but this could still be a permissions problem, if the cron job is running as a user that doesn't have access to the cache. That would be revealed if we saw what happened when listing the files in the cache.

naveenbussari commented 1 year ago

The cron generates a log file for each job that's scheduled. the logs I shared are from one such log file.

I tried printing the info of existence of .cache folder including its location. And we got desired output in the scheduler logs also gargle::gargle_oauth_sitrep() drive_user() gargle::gargle_oauth_cache() getwd() list.files("~/.cache/gargle/")

image

regarding permissions , we can see in the below image, all other users also have read permissions. I think read permission should do it.

image

Is there a way to print logs of what's actually happening when driver_auth() is run?

naveenbussari commented 1 year ago

I tried giving executable permissions to .cache directory. Still the error remains.

image
naveenbussari commented 1 year ago

in the code the "drive_user() and gargle::gargle_oauth_cache()" gives the following output

Screenshot 2023-04-20 at 3 20 41 PM

Where as in the scheduler logs, we can see

Screenshot 2023-04-20 at 3 15 19 PM
jennybc commented 1 year ago

Can you print the gargle version? Could the cron job somehow be using an older version of gargle?

naveenbussari commented 1 year ago

I edited the job to print gargle version. It uses latest version 1.4.0

Screenshot 2023-04-20 at 11 47 49 PM image
jennybc commented 1 year ago

I would get even more debugging info to compare by setting:

options(gargle_verbosity = "debug")
naveenbussari commented 1 year ago

tried options(gargle_verbosity = "debug") before and after drive_auth()

code:

library(gargle)
library(cronR)
library(googlesheets4)
library(googledrive)

options(gargle_oauth_cache = "~/.cache/gargle",
        gargle_oauth_email = "bussari.xxxxxx.com")
gargle::gargle_oauth_sitrep()
options(gargle_verbosity = "debug")
drive_user()
gargle::gargle_oauth_cache()
# drive_auth()
options(gargle_verbosity = "debug")
googledrive::drive_auth(email = "bussari.xxxxxx.com",
                        cache = "~/.cache/gargle")

googlesheets4::gs4_auth(email = "bussari.xxxxxx.com",
                        cache = "~/.cache/gargle")
options(gargle_verbosity = "debug")
x <-read_sheet('https://docs.google.com/spreadsheets/d/1VWS3hbSP6Q5Q25YG16dqgBAejy_eojYzN_V-2Q9epFk/edit#gid=0')

x

The logs look hopeful

Attaching package: ‘googlesheets4’
The following object is masked from ‘package:gargle’:
    request_make
Attaching package: ‘googledrive’
The following objects are masked from ‘package:googlesheets4’:
    request_generate, request_make
The following object is masked from ‘package:gargle’:
    request_make

ℹ Taking cache location from the `"gargle_oauth_cache"` option.
2 tokens found in this gargle OAuth cache:
'~/.cache/gargle'

email                          app             scopes          hash...   
______________________________ _______________ _______________ __________
bussari.xxxxxx.com tidyverse-erato ...spreadsheets 54a075e...
bussari.xxxxxx.com tidyverse-erato ...drive        928dfaa...
ℹ Not logged in as any specific Google user.
[1] "~/.cache/gargle"
trying `token_fetch()`
Trying `credentials_byo_oauth()` ...
Error caught by `token_fetch()`:
inherits(token, "Token2.0") is not TRUE
trying `credentials_service_account()`
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_external_account()`
adding "userinfo.email" scope
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_app_default()`
Trying `credentials_gce()` ...
✖ We don't seem to be on GCE.
trying `credentials_user_oauth2()`
attempt to access internal gargle data from: googledrive
Gargle2.0 initialize
adding "userinfo.email" scope
loading token from the cache
no matching token in the cache
initiating new token
Error caught by `token_fetch()`:
OAuth2 flow requires an interactive session.
Error in `googledrive::drive_auth()`:
! Can't get Google credentials.
ℹ Are you running googledrive in a non-interactive session? Consider:
• Call `drive_deauth()` to prevent the attempt to get credentials.
• Call `drive_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <https://gargle.r-lib.org/articles/non-interactive-auth.html>
Backtrace:
    ▆
 1. └─googledrive::drive_auth(email = "bussari.xxxxxx.com", cache = "~/.cache/gargle")
 2.   └─googledrive:::drive_abort(...)
 3.     └─cli::cli_abort(message = message, ..., .envir = .envir)
 4.       └─rlang::abort(...)
Execution halted
jennybc commented 1 year ago

My leading hypothesis is that, in the scheduled context, its defaulting to a different OAuth client, which would explain the cache miss:

trying `credentials_user_oauth2()`
attempt to access internal gargle data from: googledrive
Gargle2.0 initialize
adding "userinfo.email" scope
loading token from the cache
no matching token in the cache
initiating new token

However the known cause for that problem has been fixed: https://github.com/r-lib/gargle/commit/fa49d2010b66e167cd0f41611002ce30eae8e50f

Maybe your cron job is not a sub-process of RStudio at all?

You could try setting this option explicitly to force the use of a web OAuth client, which I can tell is what your cached tokens are associated with.

options(gargle_oauth_client_type = "web")
naveenbussari commented 1 year ago

Finally some good news!

Added the line options(gargle_oauth_client_type = "web") before loading gargle cache. image

It worked. The scheduled job ran and returned the desired output(reading and printing data from excel sheet)

Attaching package: ‘googlesheets4’
The following object is masked from ‘package:gargle’:
    request_make
Attaching package: ‘googledrive’
The following objects are masked from ‘package:googlesheets4’:
    request_generate, request_make
The following object is masked from ‘package:gargle’:
    request_make
ℹ Taking cache location from the `"gargle_oauth_cache"` option.
2 tokens found in this gargle OAuth cache:
'~/.cache/gargle'

email                          app             scopes          hash...   
______________________________ _______________ _______________ __________
bussari.xxxxxx.com tidyverse-erato ...spreadsheets 54a075e...
bussari.xxxxxx.com tidyverse-erato ...drive        928dfaa...
ℹ Not logged in as any specific Google user.
[1] "~/.cache/gargle"
trying `token_fetch()`
Trying `credentials_byo_oauth()` ...
Error caught by `token_fetch()`:
inherits(token, "Token2.0") is not TRUE
trying `credentials_service_account()`
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_external_account()`
adding "userinfo.email" scope
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_app_default()`
Trying `credentials_gce()` ...
✖ We don't seem to be on GCE.
trying `credentials_user_oauth2()`
attempt to access internal gargle data from: googledrive
Gargle2.0 initialize
adding "userinfo.email" scope
loading token from the cache
matching token found in the cache
trying `token_fetch()`
Trying `credentials_byo_oauth()` ...
Error caught by `token_fetch()`:
inherits(token, "Token2.0") is not TRUE
trying `credentials_service_account()`
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_external_account()`
adding "userinfo.email" scope
Error caught by `token_fetch()`:
Argument 'txt' must be a JSON string, URL or file.
trying `credentials_app_default()`
Trying `credentials_gce()` ...
✖ We don't seem to be on GCE.
trying `credentials_user_oauth2()`
attempt to access internal gargle data from: googlesheets4
Gargle2.0 initialize
adding "userinfo.email" scope
loading token from the cache
matching token found in the cache
attempt to access internal gargle data from: googlesheets4
✔ Reading from "rstudio_test".
✔ Range 'Sheet1'.
attempt to access internal gargle data from: googlesheets4
# A tibble: 2 × 2
  Name      ID
  <chr>  <dbl>
1 Vinay      1
2 Naveen     2

But one thing that still worries is , as it can seen in the logs that says 'Not logged in as any specific Google user'. Because we are looking forward to schedule jobs to trigger emails

jennybc commented 1 year ago

drive_user() does not trigger auth, so if auth hasn't happened yet, this is what it says.

Once you do auth explicitly or some action that requires auth implicitly, drive_user() will tell you who you are auth'ed as.

HugoGit39 commented 1 year ago

Thanks a lot! the options(gargle_oauth_client_type = "web") did the trick with me too! Cost me a whole day but finally have it to work!

jennybc commented 1 year ago

The possibility of setting options(gargle_oauth_client_type = "web") is now mentioned in more places in gargle and packages that derive their docs from it. So I think we're done here.