rstudio / pins-r

Pin, discover, and share resources
https://pins.rstudio.com
Other
312 stars 63 forks source link

Unable to load data stored with pins on rstudio connect #622

Closed yacaslimi closed 1 year ago

yacaslimi commented 2 years ago

Hello,

I'm currently working with the package pins. I am an RSW user with the following configuration:

R version 4.1.3 (2022-03-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.6 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached are base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] pins_1.0.1 yaml_2.3.5

loaded via a namespace (and not attached): [1] fansi_1.0.3 digest_0.6.29 utf8_1.2.2 crayon_1.5.1 rappdirs_0.3.3 R6_2.5.1
[7] jsonlite_1.8.0 lifecycle_1.0.1 magrittr_2.0.3 httr_1.4.3 pillar_1.7.0 rlang_1.0.2
[13] cli_3.3.0 curl_4.3.2 rstudioapi_0.13 fs_1.5.2 vctrs_0.4.1 ellipsis_0.3.2
[19] tools_4.1.3 glue_1.6.2 rsconnect_0.8.26 compiler_4.1.3 askpass_1.1 pkgconfig_2.0.3 [25] openssl_2.0.2 tibble_3.1.7

I managed to push datasets on R Studio connect with pins. image

Now I'd like to use one of that datasets on RSW. I am able to connect on RsStudio connect with the command:

board_rsconnect()

However, when I tried to load one of the datasets by using:

pin_read(board, "earth2")

It doesn't load the data. Instead, there is the following message:

    Error in yaml.load(string, error.label = error.label, ...) : 
      (~/.cache/pins/rsc-e62371cfd77db754024f9c5ed3556a73/51ca356e-619e-4148-b12f-35e3e652d14d/79/data.txt) Scanner error: mapping values are not allowed in this context at line 36, column 20955
    In addition: Warning message:
    In readLines(file, warn = readLines.warn) :
      incomplete final line found on '~/.cache/pins/rsc-e62371cfd77db754024f9c5ed3556a73/51ca356e-619e-4148-b12f-35e3e652d14d/79/data.txt'

By searching similar issues, someone has achieved to load it by deleting the data.txt file but it failed for me. https://github.com/rstudio/pins/issues/463 Do you know where the error came from?

yacaslimi commented 2 years ago

Hello everyone,

I tried to load data with pins using the previous versions ie 1.0.0 and 0.4.5 but it failed too. Is that functionality (read data from Rconnect on RSW) still supported?

machow commented 2 years ago

Hey -- this particular error seems to be a yaml parsing error. Could you try deleting the cache and then reading the pin?

You can remove this folder

~/.cache/pins/rsc-e62371cfd77db754024f9c5ed3556a73/

or run

# delete all caches
cache_prune(days=0)

edit: something else surprising about that file is it has a line at least 20955 characters long? any idea what that could be?

yacaslimi commented 2 years ago

Hello @machow,

Thanks for your reply. I started with the removing of the folder rsc-e62371cfd77db754024f9c5ed3556a73/ image but it doesn't solve the issue. image

There is no file to get here. So I used the following command to pushish a dataset on rconnect:

  library(pins)
  board <- board_rsconnect()
  mtcars <- tibble::as_tibble(mtcars)
  board %>% pin_write(mtcars, "mtcars3")

But when I tried to read it, I got the same message: image

I also tried to delete all caches and relaunch the pin_read() command but without success:

image

About the data.txt file, it is a long one. I can't get the top of it from the R terminal but this is an HTML file:

image

and it finish by image

machow commented 2 years ago

Is that HTML file you posted a picture of the contents of data.txt?

yacaslimi commented 2 years ago

Yes, absolutely.

yacaslimi commented 2 years ago

An update about this topic:

I tried to bypass the pin_read function. I used the function read.csv with the URL to the dataset pins on rconnect instead of the pin_read function. Sadly, it didn't work.

I also used download.files() function to get the dataset on my working environment on RSW but I got an HTML instead. The HTML is associated with the authentification to the active directory of the company. (login/password etc...).

The file is the same that data.txt in the cache. What I didn't understand for the moment, is how can I write pins on rsconnect without problem (board_rsconnect function perfectly) but when I try to read the file on the same rstudio connect, I get that HTML login file.

machow commented 2 years ago

Hey -- can do these steps, so we can see more of what's going on?:

Here's the code for using with_verbose:

library(httr)

board <- ...         # create board_rsconnect here
with_verbose(pin_read(board, "mtcars3"))

It should print a "Status" out, which will help identify what's going on! (highlighted in example below)

image

(h/t @colearendt for feedback, and @sellorm who has this with_verbose trick in his pins blog post!)

yacaslimi commented 2 years ago

Hello @machow,

Thanks for your reply. This is the output of the previous command. I just hid the server URL and the name. GET /api/v1/content/de880b93-2bda-4c0d-bcf7-a5a52e031385 HTTP/1.1 -> Host: rs-connect -> User-Agent: libcurl/7.58.0 r-curl/4.3.2 httr/1.4.3 -> Accept-Encoding: deflate, gzip -> Cookie: rscid=MTY1NjM1NTUzNXxEdi1CQkFFQ180SUFBUkFCRUFBQU52LUNBQU1HYzNSeWFXNW5EQVlBQkVkVlNVUVNZMjl1Ym1WamRDOXpkRzl5WlM1SFZVbEVfNE1HQVFFRVIxVkpSQUhfaEFBQUFCRF9oUVlCQVFSVlZVbEVBZi1HQUFBQVZmLUVFZ0FRbHBNM2dYQ0xRRDJUeDBzelY3elkxUVp6ZEhKcGJtY01DUUFIWTNKbFlYUmxaQVZwYm5RMk5BUUdBUHpGY19XZUJuTjBjbWx1Wnd3SkFBZHlaV1p5WlhOb0JXbHVkRFkwQkFZQV9NVno5WjQ9fJMOpB3VzVkYX4XoDItiltamtV8tAWKxcrM_gLYRjVPq; rscid-legacy=MTY1NjM1NTUzNXxEdi1CQkFFQ180SUFBUkFCRUFBQU52LUNBQU1HYzNSeWFXNW5EQVlBQkVkVlNVUVNZMjl1Ym1WamRDOXpkRzl5WlM1SFZVbEVfNE1HQVFFRVIxVkpSQUhfaEFBQUFCRF9oUVlCQVFSVlZVbEVBZi1HQUFBQVZmLUVFZ0FRbHBNM2dYQ0xRRDJUeDBzelY3elkxUVp6ZEhKcGJtY01DUUFIWTNKbFlYUmxaQVZwYm5RMk5BUUdBUHpGY19XZUJuTjBjbWx1Wnd3SkFBZHlaV1p5WlhOb0JXbHVkRFkwQkFZQV9NVno5WjQ9fJMOpB3VzVkYX4XoDItiltamtV8tAWKxcrM_gLYRjVPq -> Accept: application/json, text/xml, application/xml, /* -> Date: Mon, 27 Jun 2022 19:01:28 GMT -> X-Auth-Token: T3ad3753787697852f3230602569c29ca -> X-Auth-Signature: c4nJpYNDd7lTUxiS95gRKNf+0WzsBOoayDFaNsqqfi8GqczUOyirc7Oy6qj0+qECDLM1OQ9unkGLEI/DHIpADYHaGyLMmciNpE1BEDKeP03pcdLG1fYE1rGh3xqad99dy3Dx62pcXO535C+r+02F0OG0vn/Fv6bnitJj9AtIxDNUGMdJatUl12faWDY9uq41J5YLJoSITRuMSAeF8M0Ofq2agSwXOK3HV5f+wX5M3Y98DpFwvntls+IvOfm3MilazS5+YGUyb+LBLms6CYL2RpPiov2BDtR0DunO5EpRr4IrcC2Hoj7EUxuFBb91fHpGFoQUx+50PKdVmk7ILlmckA== -> X-Content-Checksum: 1B2M2Y8AsgTpgAmY7PhCfg== -> <- HTTP/1.1 200 OK <- Server: nginx/1.18.0 (Ubuntu) <- Date: Mon, 27 Jun 2022 19:01:28 GMT <- Content-Type: application/json; charset=utf-8 <- Content-Length: 949 <- Connection: keep-alive <- Cache-Control: no-cache, no-store, must-revalidate <- Expires: 0 <- Pragma: no-cache <- Strict-Transport-Security: max-age=2592000 <- X-Content-Type-Options: nosniff <- X-Correlation-Id: 33ccdf7e-09fe-4bf1-ba8b-4d2511737d63 <- X-Frame-Options: DENY <- -> GET /content/de880b93-2bda-4c0d-bcf7-a5a52e031385/_rev100/data.txt HTTP/1.1 -> Host: rstudioconnectuat*** -> User-Agent: libcurl/7.58.0 r-curl/4.3.2 httr/1.4.3 -> Accept-Encoding: deflate, gzip -> Accept: application/json, text/xml, application/xml, / -> Date: Mon, 27 Jun 2022 19:01:28 GMT -> X-Auth-Token: T3ad3753787697852f3230602569c29ca -> X-Auth-Signature: nwnusriJ8QmNmI/cFNwYfZlma/i0uH7/Qgl3EESISoNOxle16FTxKQj8ePVSWrnpqULz7001T5ChfxagBQWoVlCa63rQlXLxhvz65UYcZamkdkTATyoBmC5rzJ3CKyO/9YPFErVLne0VeEpFYN1kFiX/ji7XNoXELsEbGUsn9XE/HtwmmzwGqB0B5oUVsrGcpc8MsiuwbQ6BP1gFs2mFsYFyKukxy8FgS9jfb59ZcXeFy2GoRLshoa7wuFYjdHE4uUzcX4BmYYffZXZHuFoOc4SYNTuV1uh5Y0rpp+VwdLeF8vhKIE7GXlpn2v09R1UMdbCPcLn/hxjKHLYUPU3csg== -> X-Content-Checksum: 1B2M2Y8AsgTpgAmY7PhCfg== -> <- HTTP/1.1 302 Found <- Content-Length: 0 <- Location: https://login.microsoftonline.com/cc0a4ff6-9454-4e4b-881b-85f448dee2e3/oauth2/authorize?response_type=code&client_id=bb5dc556-d32e-4932-8b0e-5739e67bc626&scope=openid&nonce=417743e6-1627-47a4-9976-60474598fc98&redirect_uri=https%3a%2f%2frstudioconnectuat-serviergroup.msappproxy.net%2f&state=AppProxyState%3a%7b%22InvalidTokenRetry%22%3anull%2c%22IsMsofba%22%3afalse%2c%22OriginalRawUrl%22%3a%22https%3a%5c%2f%5c%2frstudioconnectuat-serviergroup.msappproxy.net%5c%2fcontent%5c%2fde880b93-2bda-4c0d-bcf7-a5a52e031385%5c%2f_rev100%5c%2fdata.txt%22%2c%22RequestProfileId%22%3anull%2c%22SessionId%22%3a%22b7e52e34-e94b-450b-bc49-914bb312d51c%22%7d%23EndOfStateParam%23&client-request-id=b7e52e34-e94b-450b-bc49-914bb312d51c <- x-ms-proxy-app-id: bb5dc556-d32e-4932-8b0e-5739e67bc626 <- x-ms-proxy-group-id: e49b4bd3-088b-4868-a004-1a538ffb8031 <- x-ms-proxy-subscription-id: cc0a4ff6-9454-4e4b-881b-85f448dee2e3 <- x-ms-proxy-transaction-id: 417743e6-1627-47a4-9976-60474598fc98 <- x-ms-proxy-service-name: proxy-appproxy-WEUR-AMS02P-2 <- x-ms-proxy-data-center: WEUR <- Nel: {"report_to":"network-errors","max_age":86400,"success_fraction":0.2,"failure_fraction":1.0} <- Report-To: {"group":"network-errors","max_age":86400,"endpoints":[{"url":"https://ffde.nelreports.net/api/report?cat=proxy-appproxy-WEUR-AMS02P-2"}]} <- Set-Cookie: AzureAppProxyPreauthSessionCookie_bb5dc556-d32e-4932-8b0e-5739e67bc626_b7e52e34-e94b-450b-bc49-914bb312d51c_1.4=3|pDEBjBdc+9Mumbr84TRwXSP8gv4BTbPbyC0sXlLf/ag0vnV2Y2RNjZPCtQAIct8f4iCHQE1ASbWEPh1OWD3sAqqsqAJxKTNHxpTRgfS1AbdCkJCk8TxxL8iQBZopPX0jKzaQADprb52Ra5idCzGGan2/ABLLk+kDD3g/OJrA74mBOgJU4vO+c/+vKOdrc0t8eUaucIgvp0561uqC0ZCAJl5lUk9K+rzmg5T5OHvAGNdOVkBt9M+HlgJbH+Gt5AHrKrrfxs4O8/oeWzLKfB1ykCVLqkwQpzniwYVs+VaHtexCqZy+qrWgTooArVGZ/G1iodY9hdkHPcDWhG5bpWvjOU8tpJJ9Ybeg3bAsbxz3HgLA6bFlpsm3gSLiw86+gcY9; expires=Mon, 27 Jun 2022 19:11:28 GMT; path=/; Secure; SameSite=None <- Set-Cookie: AzureAppProxyAnalyticCookie_bb5dc556-d32e-4932-8b0e-5739e67bc626_https_1.3=3|Qq4Lu07ars0j0KRzaYzT/oXGM+BYzTce8YEh6NIGAkttqMpjc2r/52LABM7SsEuhK6TyvblEsX22yZbP/06szXoGt6b6iXrMNEGlWu+vzV76hvTpFyvHiHqEXeofroGXJFihdVqjwZh0VQJxkQl4vA==; path=/; Secure; SameSite=None <- Date: Mon, 27 Jun 2022 19:01:27 GMT <- -> GET /cc0a4ff6-9454-4e4b-881b-85f448dee2e3/oauth2/authorize?response_type=code&client_id=bb5dc556-d32e-4932-8b0e-5739e67bc626&scope=openid&nonce=417743e6-1627-47a4-9976-60474598fc98&redirect_uri=https%3a%2f%2frstudioconnectuat-serviergroup.msappproxy.net%2f&state=AppProxyState%3a%7b%22InvalidTokenRetry%22%3anull%2c%22IsMsofba%22%3afalse%2c%22OriginalRawUrl%22%3a%22https%3a%5c%2f%5c%2frstudioconnectuat-serviergroup.msappproxy.net%5c%2fcontent%5c%2fde880b93-2bda-4c0d-bcf7-a5a52e031385%5c%2f_rev100%5c%2fdata.txt%22%2c%22RequestProfileId%22%3anull%2c%22SessionId%22%3a%22b7e52e34-e94b-450b-bc49-914bb312d51c%22%7d%23EndOfStateParam%23&client-request-id=b7e52e34-e94b-450b-bc49-914bb312d51c HTTP/1.1 -> Host: login.microsoftonline.com -> User-Agent: libcurl/7.58.0 r-curl/4.3.2 httr/1.4.3 -> Accept-Encoding: deflate, gzip -> Accept: application/json, text/xml, application/xml, / -> Date: Mon, 27 Jun 2022 19:01:28 GMT -> X-Auth-Token: T3ad3753787697852f3230602569c29ca -> X-Auth-Signature: nwnusriJ8QmNmI/cFNwYfZlma/i0uH7/Qgl3EESISoNOxle16FTxKQj8ePVSWrnpqULz7001T5ChfxagBQWoVlCa63rQlXLxhvz65UYcZamkdkTATyoBmC5rzJ3CKyO/9YPFErVLne0VeEpFYN1kFiX/ji7XNoXELsEbGUsn9XE/HtwmmzwGqB0B5oUVsrGcpc8MsiuwbQ6BP1gFs2mFsYFyKukxy8FgS9jfb59ZcXeFy2GoRLshoa7wuFYjdHE4uUzcX4BmYYffZXZHuFoOc4SYNTuV1uh5Y0rpp+VwdLeF8vhKIE7GXlpn2v09R1UMdbCPcLn/hxjKHLYUPU3csg== -> X-Content-Checksum: 1B2M2Y8AsgTpgAmY7PhCfg== -> <- HTTP/1.1 200 OK <- Cache-Control: no-store, no-cache <- Pragma: no-cache <- Content-Type: text/html; charset=utf-8 <- Content-Encoding: gzip <- Expires: -1 <- Vary: Accept-Encoding <- Strict-Transport-Security: max-age=31536000; includeSubDomains <- X-Content-Type-Options: nosniff <- X-Frame-Options: DENY <- Link: https://aadcdn.msftauth.net; rel=preconnect; crossorigin <- Link: https://aadcdn.msftauth.net; rel=dns-prefetch <- Link: https://aadcdn.msauth.net; rel=dns-prefetch <- X-DNS-Prefetch-Control: on <- P3P: CP="DSP CUR OTPi IND OTRi ONL FIN" <- x-ms-request-id: fb4ee3ef-932b-4a55-962d-cd05b1640000 <- x-ms-ests-server: 2.1.13081.9 - WEULR1 ProdSlices <- X-XSS-Protection: 0 <- Set-Cookie: buid=0.AUgA9k8KzFSUS06IG4X0SN7i41bFXbsu0zJJiw5XOeZ7xiZIAAA.AQABAAEAAAD--DLA3VO7QrddgJg7Wevr0fYXvZc7mPR5o_r7PBmk--gkIPclaB_UU_pC8kv_KcaCNm01-q-rtX5oD0Xz8r5YMYJsRkF-z2Yj-9DN01jVHg6o24ZjqBdsDtg1Ftr44CwgAA; expires=Wed, 27-Jul-2022 19:01:28 GMT; path=/; secure; HttpOnly; SameSite=None <- Set-Cookie: fpc=AiFRlyT-aOBEpIUNtr1OP0eTLuLNAQAAAIj1S9oOAAAA; expires=Wed, 27-Jul-2022 19:01:28 GMT; path=/; secure; HttpOnly; SameSite=None <- Set-Cookie: esctx=AQABAAAAAAD--DLA3VO7QrddgJg7WevrFtCviePz6PfrpHmVWLBkrMFsth3qHwt1T02hdJ75zcNW1UD5n6iggylBZ5jyLcLDaMtaiv7HqDVpelwH1fRkjDEPlgawcCrC1HOZrp8h55zyOYbKPYqyUGhwC_uiSq9Dvxy_EVjb2o3IYrvg1ql3EuNmyBWaI7lOjRt1U0IzdhggAA; domain=.login.microsoftonline.com; path=/; secure; HttpOnly; SameSite=None <- Set-Cookie: x-ms-gateway-slice=estsfd; path=/; secure; samesite=none; httponly <- Set-Cookie: stsservicecookie=estsfd; path=/; secure; samesite=none; httponly <- Date: Mon, 27 Jun 2022 19:01:28 GMT <- Content-Length: 49317 <- Error in yaml.load(string, error.label = error.label, ...) : (~/.cache/pins/rsc-e62371cfd77db754024f9c5ed3556a73/de880b93-2bda-4c0d-bcf7-a5a52e031385/100/data.txt) Scanner error: mapping values are not allowed in this context at line 36, column 20944 In addition: Warning message: In readLines(file) : incomplete final line found on '~/.cache/pins/rsc-e62371cfd77db754024f9c5ed3556a73/de880b93-2bda-4c0d-bcf7-a5a52e031385/100/data.txt'

Hope this helps you :)

machow commented 2 years ago

Thanks--it looks like it is trying to do a redirect to login.microsoftonline.com, but I'm not sure why :/.

What happens if you open RSC in the browser, and then paste the URL for retrieving data.txt file directly?

<your board server url>/content/de880b93-2bda-4c0d-bcf7-a5a52e031385/_rev100/data.txt

For example, here's what I get when I do something similar with an internal testing server:

image
sellorm commented 2 years ago

@yacaslimi - in addition to what @machow suggests could you also check if you have a WAF (Web Application Firewall) in front of you connect server.

I have seen a small number of cases where a misconfigured WAF caused problems with some, but not all, requests to Connect.

If you do, turn it off and try again. If that works, you'd need to play around with the WAF config to allow these requests.

yacaslimi commented 2 years ago

@machow: Yes, I get this metadata about the dataset mtcars3:

image

@sellorm: Yes, not sure about that. I'll check this point with the IT service tomorrow. Thanks a lot for your help.

yacaslimi commented 2 years ago

Hello @sellorm, @machow,

I've checked with the IT service. There is effectively a WAF in front of rconnect. According to them, it is not possible to module the waf to allow a specific type of request as those indue by the pin_read() function.

I'll continue to investigate from my side and will keep you informed of further development.

sellorm commented 2 years ago

Thanks for the update @yacaslimi.

In cases where I've seen this before the IT team has been able to temporarily disable the WAF to confirm the issue and then reenable it with new or edited WAF rules to allow the appropriate API request(s), which has fixed the issue.

I hope you manage to get this resolved and please let us know if we can be of further assistance.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.