rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Error when using pin_upload() in a pin board created by board_gdrive() #777

Closed gorkang closed 1 year ago

gorkang commented 1 year ago

Hi there. I am trying to use a Shared Google Drive folder to create pin boards to upload and download files.

If I do that with a local folder with board_folder(), everything works fine. But I can't make board_gdrive() work with a shared Google Drive folder.

The containing folders for the pin board are created, but the file never uploads, and I get some errors and warnings.

drive_link = "https://drive.google.com/drive/folders/1GNs7jJeKqVgv7qA5nRcNzGHbCejOZpDh"
googledrive::drive_auth(email = "myemail@gmail.com")

gdrive_board = pins::board_gdrive(drive_link)
gdrive_board |> pins::pin_list()
#> character(0)

gdrive_board |> pins::pin_upload(paths = "~/Downloads/999.zip", name = "First_pin")
#> Creating new version '20230901T115649Z-db9b5'
#> Warning: Unknown or uninitialised column: `path`.
#> Error in if (is.null(name) && !has_slash(path) && drive_path_exists(append_slash(path))) {: missing value where TRUE/FALSE needed
gdrive_board |> pins::pin_list()
#> [1] "First_pin"

gdrive_board |> pins::pin_upload(paths = "~/Downloads/999.zip", name = "First_pin")
#> Warning: Unknown or uninitialised column: `path`.
#> Error in `as_parent()`:
#> ! Parent specified via `path` is invalid:
#> ✖ Does not exist.
#> Backtrace:
#>      ▆
#>   1. └─pins::pin_upload(...)
#>   2.   ├─pins::pin_store(board, name, paths, meta, ...)
#>   3.   └─pins:::pin_store.pins_board_gdrive(...)
#>   4.     └─pins:::version_setup(board, name, version_name(metadata), versioned = versioned)
#>   5.       └─pins::pin_versions(board, name)
#>   6.         ├─pins:::pin_versions_modern(board, name, ...)
#>   7.         └─pins:::pin_versions.pins_board_gdrive(board, name, ...)
#>   8.           ├─pins:::version_from_path(sort(googledrive::drive_ls(path)$name))
#>   9.           │ └─pins:::versions_template(x)
#>  10.           ├─base::sort(googledrive::drive_ls(path)$name)
#>  11.           └─googledrive::drive_ls(path)
#>  12.             └─googledrive:::as_parent(path)
#>  13.               └─googledrive:::drive_abort(c(invalid_parent, x = "Does not exist."))
#>  14.                 └─cli::cli_abort(message = message, ..., .envir = .envir)
#>  15.                   └─rlang::abort(...)

gdrive_board |> pins::pin_upload(paths = "~/Downloads/999.zip", name = "Second_pin")
#> Creating new version '20230901T115656Z-db9b5'
#> Warning: Unknown or uninitialised column: `path`.
#> Error in if (is.null(name) && !has_slash(path) && drive_path_exists(append_slash(path))) {: missing value where TRUE/FALSE needed
gdrive_board |> pins::pin_list()
#> [1] "Second_pin" "First_pin"

Created on 2023-09-01 with reprex v2.0.2

Thanks!

juliasilge commented 1 year ago

I think the problem is how you are specifying the Google Drive location path. Note the docs here:

Path to existing directory on Google Drive to store pins. Can be given as an actual path like "path/to/folder" (character), a file id or URL marked with googledrive::as_id(), or a googledrive::dribble.

Can you use your URL to make a "dribble" or in googledrive::as_id()?

To check out a working example, see how I specify my folder:

library(pins)
board <- board_gdrive("pins-testing")
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The googledrive package is using a cached token for 'julia.silge@gmail.com'.
board |> pin_list()
#> character(0)

path <- fs::path_temp("some-letters.txt")
readr::write_lines(sample(LETTERS, size = 20), path)
pin_upload(board, paths = path, name = "really-great-letters")
#> Creating new version '20230901T202058Z-56115'
board |> pin_list()
#> [1] "really-great-letters"

Created on 2023-09-01 with reprex v2.0.2

gorkang commented 1 year ago

Thanks for the quick answer Julia!

With googledrive::as_dribble(drive_link) or googledrive::as_id(drive_link) I get the same error. The containing folders get created, but the file and metadata is never uploaded.

DRIBBLE = googledrive::as_dribble(drive_link)
gdrive_board = pins::board_gdrive(DRIBBLE)
gdrive_board |> pins::pin_upload(paths = "~/Downloads/999.zip", name = "Dribble_pin")
#> Error in if (is.null(name) && !has_slash(path) && drive_path_exists(append_slash(path))) { : 
#>   missing value where TRUE/FALSE needed
#> In addition: Warning message:
#> Unknown or uninitialised column: `path`. 
ID = googledrive::as_id(drive_link)
gdrive_board = pins::board_gdrive(ID)
gdrive_board |> pins::pin_upload(paths = "~/Downloads/999.zip", name = "Id_pin")
#> Error in if (is.null(name) && !has_slash(path) && drive_path_exists(append_slash(path))) { : 
#>   missing value where TRUE/FALSE needed
#> In addition: Warning message:
#> Unknown or uninitialised column: `path`. 

Also, when using your example (I created the pins-testing folder first), R gets stuck, giving no feedback for a long time. I stopped it after more than 5 minutes. the containing folders are created but the file and metadata never is uploaded.

I also tried adapting your example and uploading a file with a normalizePath(), just in case. Nothing worked. I attach my session info in case helps debug this.

library(pins)
board <- board_gdrive("pins-testing")
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The googledrive package is using a cached token for 'gorkang@gmail.com'.
board |> pin_list()
#> [1] "really-great-letters"  "really-great-letters2"
> pin_upload(board, paths = normalizePath("~/Downloads/ASD/chart.csv"), name = "really-great-letters3)
Creating new version '20230902T065741Z-afc30'
#> Nothing happens for a loong time

Created on 2023-09-02 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os Ubuntu 22.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_US:en #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Atlantic/Canary #> date 2023-09-02 #> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> askpass 1.1 2019-01-13 [1] RSPM (R 4.3.0) #> cli 3.6.1 2023-03-23 [1] RSPM (R 4.3.0) #> curl 5.0.2 2023-08-14 [1] RSPM (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] RSPM (R 4.3.0) #> dplyr 1.1.2 2023-04-20 [1] RSPM (R 4.3.0) #> ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] RSPM (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] RSPM (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] RSPM (R 4.3.0) #> gargle 1.5.2 2023-07-20 [1] RSPM (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] RSPM (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] RSPM (R 4.3.0) #> googledrive 2.1.1 2023-06-11 [1] RSPM (R 4.3.0) #> htmltools 0.5.6 2023-08-10 [1] RSPM (R 4.3.0) #> httr 1.4.7 2023-08-15 [1] RSPM (R 4.3.0) #> jsonlite 1.8.7 2023-06-29 [1] RSPM (R 4.3.0) #> knitr 1.43 2023-05-25 [1] RSPM (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] RSPM (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.3.0) #> openssl 2.1.0 2023-07-15 [1] RSPM (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] RSPM (R 4.3.0) #> pins * 1.2.1 2023-08-16 [1] RSPM (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] RSPM (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] RSPM (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] RSPM (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] RSPM (R 4.3.0) #> R.utils 2.12.2 2022-11-11 [1] RSPM (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] RSPM (R 4.3.0) #> rappdirs 0.3.3 2021-01-31 [1] RSPM (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] RSPM (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] RSPM (R 4.3.0) #> rmarkdown 2.24 2023-08-14 [1] RSPM (R 4.3.0) #> rstudioapi 0.15.0 2023-07-07 [1] RSPM (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] RSPM #> styler 1.10.1 2023-06-05 [1] RSPM (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] RSPM (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] RSPM (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] RSPM (R 4.3.0) #> vctrs 0.6.3 2023-06-14 [1] RSPM (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] RSPM (R 4.3.0) #> xfun 0.40 2023-08-09 [1] RSPM (R 4.3.0) #> yaml 2.3.7 2023-01-23 [1] RSPM (R 4.3.0) #> #> [1] /home/emrys/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Thanks!

juliasilge commented 1 year ago

So those items "really-great-letters" and "really-great-letters2" are folders that got created but the pin contents were never uploaded?

If you have a local path where your file is and a board object that has authenticated, what happens when you do this (with a fake version setup)?

googledrive::drive_upload(
      path,
      fs::path(board$dribble$path, "20230405T060708Z-xxxxx", fs::path_file(path))
    )
gorkang commented 1 year ago

Yes, exactly, the folders are created but the pin contents (files) are not uploaded.

Uploading a file using your code also get's stuck. No errors, no feedback.

The only way I got it to work was by using a Dribble to the desired folder in the path argument.

file_to_upload <- fs::path_temp("some-letters.txt")
readr::write_lines(sample(LETTERS, size = 20), file_to_upload)

DRIBBLE = googledrive::as_dribble("https://drive.google.com/drive/folders/1rZyqHbpVJ0ZPoOxXtwYGexQ-FegEXGtN")
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The googledrive package is using a cached token for 'gorkang@gmail.com'.

googledrive::drive_upload(
  media = file_to_upload, 
  path = DRIBBLE,
  name = fs::path_file(file_to_upload)
)
#> Local file:
#> • '/tmp/RtmpeklPCr/some-letters.txt'
#> Uploaded into Drive file:
#> • 'some-letters.txt' <id: 1QXSWb1lkf6iOWRewEstOfusn3RA4hGti>
#> With MIME type:
#> • 'text/plain'

It seems that using an actual folder path in the path argument makes googledrive hang. It reminds me of this. I used httr::with_verbose() to see what was going on, and I got an infinite string of things similar to:

-> GET /drive/v3/files?orderBy=recency%20desc&q=%28trashed%20%3D%20false%29%20and%20%28mimeType%20%3D%20%27application%2Fvnd.google-apps.folder%27%29&supportsAllDrives=TRUE&fields=nextPageToken%2C%2A&pageToken=~%21%21~AI9FV7SiIp4h6I262pbJBy9UAIFkIovD-ZgICvNpU5Nl-GdSlPSczGzYKT5v3OMvRPTdaPojhkPEJ0LPXTSktN0hMd9e4EWWiqr3fvbZprBxoV8DrbFMIApO6kRCyXJJXVG2ODo32O1WFEjvTLw76gQjLetnNs_XI5FeLBQmXllK6dESoWgy1CjAZu6VHSjcAcuc2m75u20n-cP6RwC1zr4MuFjrxOobmDUP8nhcmsH5eat434GLQrquWBwrFrL0foaMgF5wsh8eOrtzPXpW_d2g4wopMzWhCv7eJmCct0BXGv140FJfQo32SLm-ZdG8_o2Ox7jDTSBj9gigmRs1B1jo0Cnm1JfsYw%3D%3D HTTP/2 -> Host: www.googleapis.com -> user-agent: googledrive/2.1.1 (GPN:RStudio; ) gargle/1.5.2 httr/1.4.7 -> accept-encoding: deflate, gzip, br, zstd -> accept: application/json, text/xml, application/xml, / -> authorization: Bearer [EDITED] -> <- HTTP/2 200 <- vary: Origin, X-Origin <- date: Sat, 02 Sep 2023 20:15:04 GMT <- expires: Mon, 01 Jan 1990 00:00:00 GMT <- pragma: no-cache <- cache-control: no-cache, no-store, max-age=0, must-revalidate <- content-type: application/json; charset=UTF-8 <- content-encoding: gzip <- server: ESF <- content-length: 10454 <- x-xss-protection: 0 <- x-frame-options: SAMEORIGIN <- x-content-type-options: nosniff <- alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000 Created on 2023-09-02 with reprex v2.0.2

juliasilge commented 1 year ago

Hmmmm, this is definitely not something I have experienced.

Can you use that working dribble as the first argument to board_gdrive() and write a pin that way?

gorkang commented 1 year ago

I created a pull request that seems to solve the issue.

The fix is very simple, just use a dribble for the path argument of googledrive::drive_upload().

According to the old googledrive issue mentioned above, in Google Drives with a lot of files and folders (I have > 1TB), it is important to be as specific as possible.

For whatever reason, using the full path to the file (fs::path(board$dribble$path, version_dir, "data.txt")) in the path argument is not enough, but using a dribble for the path, and then the name argument for the file name, works fine.

juliasilge commented 1 year ago

Thank you so much for the PR @gorkang! 🙌

Closed in #780

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.