tidyverse / googledrive

Google Drive R API
https://googledrive.tidyverse.org/
Other
322 stars 47 forks source link

`drive_link()` does not work properly with service account #423

Closed thgsponer closed 1 year ago

thgsponer commented 1 year ago

Hi

I want to get the link of pictures on my google drive. What I do is the following:

drive_auth()
path_to_pics <- c('path/to/pic/pic1.png', 'path/to/pic/pic2.png', 'path/to/pic/pic3.png',...,  'path/to/pic/pic270.png')
drive_link(path_to_pics)

This uses my personal email for athentication and works well.

Now I want to use non-interactive authentication. So, I created a service account and shared the folder with my pics with it. So far so good, I can list all the pictures in the folder with the service account using drive_ls('path/to/pics'). When running drive_link(path_to_pics[1]) with a single picture I get the link. However, when I use the character vector with paths to all my pictures (in total 270 pictures), I never get 270 links back. Sometimes 140, somtimes 121, just a random number of links.

Some system and session info:

R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22621) googledrive_2.0.0

What is going wrong?

jennybc commented 1 year ago

If drive_ls('path/to/pics') lists all the pictures, I would apply drive_link() to that. So something like drive_ls('path/to/pics') |> drive_link().

Once the drive_ls() succeeds, you have the drive_resource for each file, and drive_link() just elevates a specific piece of data, i.e. it doesn't make any additional API calls.

By going back through file paths ("when I use the character vector with paths"), there's a lot of unnecessary and error-prone (somewhat stochastic) rework.

thgsponer commented 1 year ago

Thank you very much for the hint. I now use drive_ls('path/to/pics') |> drive_link(). What consistently works is that I get as many links as I have pictures in the output of drive_ls. Something is still strange since I still observe differences depending on the authentication. I repeated drive_ls('path/to/pics') 100 times for both types of authentication. With the service account I get links for all the pictures in 80% of the runs. For the personal account always.

# service account (sa)
drive_auth(path = Sys.getenv('GD_AUTH_FILE'))
n_pics_sa <- numeric(100)
for (i in 1:100){
  n_pics_sa[i] <- drive_ls('path/to/pics') %>% nrow()
}
drive_deauth()
mean(n_pics_sa == 272)
[1] 0.79

# personal account (pa), interactive authentication
drive_auth()
n_pics_pa <- numeric(100)
for (i in 1:100){
  n_pics_pa[i] <- drive_ls('path/to/pics') %>% nrow()
}
drive_deauth()
mean(n_pics_pa == 272)
[1] 1
jennybc commented 1 year ago

I think you're seeing phenomena discussed in #288 and AFAIK there is nothing I can do about this. My advice remains as stated in the conclusion of #288:

The main advice is to make one's queries as specific as possible, in ways that will route through the q search parameter or request specific file ids, as opposed to vague, unconstrained searches.

thgsponer commented 1 year ago

Thank you. I will try to be as specific as possible. It is still strange to me it seems to depend on whether or not I am using the service account.

Do you know whether it depends on the number of files?

jennybc commented 1 year ago

I don't know anything for sure about this stochastic behaviour.

But in #288 I and others definitely noticed that you're more likely to see dysfunctional behaviour when accessing files that are shared with the user but that the user does not technically own. And that is consistent with your observation that you see the problem when auth'ed as the service account, but not with the user account that actually owns the files.

jennybc commented 1 year ago

Do you know whether it depends on the number of files?

And yes the problem does seem to be aggravated whenever the number of files forces us traverse pages.

So, the very worst situation, is to combine "lots of files" and "files specified by path not id" and "files shared but not owned". Anything you can do to eliminate these challenging features of a file-finding task will help.