rstudio / pins-python

https://rstudio.github.io/pins-python/
MIT License
50 stars 12 forks source link

Can't read a pin from board_s3 #217

Open cholu6768 opened 1 year ago

cholu6768 commented 1 year ago

First of all, thank you for looking at my question :)

I have the following issue. I'm able to access the board and also list which pins are inside:

import pins # version 0.8.3

board_tst = board_s3(PATH_TO_MY_BOARD)
board_tst.pin_list()

But when I try to read a pin

tst = board_tst.pin_read('test_data_starwars')

I get the following error:

PinsError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 tst = board_tst.pin_read('test_data_starwars')

File ~/.local/lib/python3.8/site-packages/pins/boards.py:212, in BaseBoard.pin_read(self, name, version, hash)
    198 def pin_read(self, name, version: Optional[str] = None, hash: Optional[str] = None):
    199     """Return the data stored in a pin.
    200 
    201     Parameters
   (...)
    210 
    211     """
--> 212     meta = self.pin_fetch(name, version)
    214     if isinstance(meta, MetaRaw):
    215         raise TypeError(
    216             "Could not find metadata for this pin version. If this is an individual "
    217             "file, may need to use pin_download()."
    218         )

File ~/.local/lib/python3.8/site-packages/pins/boards.py:187, in BaseBoard.pin_fetch(self, name, version)
    186 def pin_fetch(self, name: str, version: Optional[str] = None) -> Meta:
--> 187     meta = self.pin_meta(name, version)
    189     # TODO: sanity check caching (since R pins does a cache touch here)
    190     # path = self.construct_path([self.board, name, version])
...
---> 98     raise PinsError("Cannot check version, since pin %s does not exist" % name)
    100 versions_raw = self.fs.ls(self.construct_path([self.path_to_pin(name)]))
    102 # get a list of Version(Raw) objects

PinsError: Cannot check version, since pin test_data_starwars does not exist

I also want to note that the board and the pin was initially created with R, could that be an issue?

juliasilge commented 1 year ago

Before @isabelizimm gets back from vacation next week, I just want to assure you that absolutely you should be able to roundtrip a pin through R and Python on S3, Posit Connect, etc.

If you navigate to the S3 bucket and check out the folder where these pins are stored, do you see a metadata file? It should be in the version folder, called data.txt, and look somewhat like this. If there is one there, can you share what it looks like?

cholu6768 commented 1 year ago

Hi Julia,

I have two pins saved:

This one is a CSV

List of 11
 $ file       : chr "test_data_starwars.csv"
 $ file_size  : 'fs_bytes' int 7.33K
 $ pin_hash   : chr "735f8120e142c45f"
 $ type       : chr "csv"
 $ title      : chr "test_data_starwars: a pinned 87 x 11 data frame"
 $ description: NULL
 $ created    : POSIXct[1:1], format: "2023-09-29 13:18:00"
 $ api_version: num 1
 $ user       : list()
 $ name       : chr "test_data_starwars"
 $ local      :List of 3
  ..$ dir    : 'fs_path' chr "~/.cache/pins/s3-mydatabase/test_data_starwars/20230929T131826Z-735f8"
  ..$ url    : NULL
  ..$ version: chr "20230929T131826Z-735f8

This one is RDS, but this one can't be read in Python since it's a binary R data file. I also wanted to ask if it's possible to save it as parquet with R and then read it with Python.

$ file       : chr "encrypted_data.rds"
 $ file_size  : 'fs_bytes' int 1.54G
 $ pin_hash   : chr "bd32445d57729d9a"
 $ type       : chr "rds"
 $ title      : chr "encrypted_data: a pinned 5667283 x 4 data frame"
 $ description: NULL
 $ created    : POSIXct[1:1], format: "2023-09-01 16:00:00"
 $ api_version: num 1
 $ user       : list()
 $ name       : chr "encrypted_data"
 $ local      :List of 3
  ..$ dir    : 'fs_path' chr "~/.cache/pins/s3-mydatabase/encrypted_data/20230901T160026Z-bd324"
  ..$ url    : NULL
  ..$ version: chr "20230901T160026Z-bd324"
juliasilge commented 1 year ago

The error sounds like it it having a hard time reading the metadata so I wanted to doublecheck the metadata is there and in the correct format. For that CSV pin, do you see the data.txt file when you navigate to the S3 web page for this bucket? For example, here is what I see for a pin I have saved in S3:

Screenshot 2023-10-02 at 9 36 11 AM

I have navigated to a version folder, and then I see the data.txt YAML file plus the pin contents. Do you see something similar when you go to your S3 bucket? What does the contents of the YAML look like? It should be something like this:

file: really-pretty-numbers.json
file_size: 23
pin_hash: c3943ca5a9aab2df
type: json
title: 'really-pretty-numbers: a pinned integer vector'
description: ~
created: 20221103T022316Z
api_version: 1.0
cholu6768 commented 1 year ago

Ohh, I thought by doing pin_meta() you would also get the information you need.

I checked the S3 bucket through the console and the file is there but I don’t have permission to see the txt file. I’m gonna ask the admin to give me access and I’ll get back to you :)

juliasilge commented 1 year ago

Ah, I definitely think that would be the problem if you don't have permissions to access the file. A user who wants to read a pin needs to have permissions to access the directory where the pin contents plus metadata are stored.

isabelizimm commented 12 months ago

Hey there! I am back from OOO and hopping in to confirm @juliasilge-- you won't be able to see any pin metadata without access. Let us know if getting permissions sorted out fixes your problem 😄

cholu6768 commented 12 months ago

Thank you both ;)

In the meantime while I wait to get read access from the admin, I managed to copy the txt file to my local path. This is how the txt looks like:

file: test_data_starwars.csv
file_size: 7507
pin_hash: 735f8120e142c45f
type: csv
title: 'test_data_starwars: a pinned 87 x 11 data frame'
description: ~
created: 20230929T131554Z
api_version: 1.0

I have another question, how come I can read the pin with R but not with Python? does R read the metadata in a different way than Python?

juliasilge commented 12 months ago

That data.txt file looks right. 👍

how come I can read the pin with R but not with Python?

I have been wondering the same thing!!! 🤯 It must be some difference in how the R package and Python package do authentication? In terms of reading the file itself:

https://github.com/rstudio/pins-r/blob/b3f1fcd7b9cb7743a4fc0dbf3c723e9546cb8415/R/board_s3.R#L336-L339

isabelizimm commented 12 months ago

Hmmm.... I wonder if your PATH_TO_MY_BOARD is not correct? For example, if I have a bucket named pins-testing and then a pin inside the board called starwars-data-test, my board setup/call would look like ⬇️

board = pins.board_s3("pins-testing")
board.pin_read("starwars-data-test")

One way the error you see can manifest is if you've added too much information in your board creation, eg

board = pins.board_s3("pins-testing/starwars-data-test") # this will give a PinsError
board.pin_read("starwars-data-test")

A way to check if you've added the path you expect is if you run board.pin_list()-- if you have added a path that is too deep into your bucket, you'll see a list of hashes (which are the versions), rather than the name of the pin.