rstudio / pins-python

https://rstudio.github.io/pins-python/
MIT License
51 stars 12 forks source link

Better support for cloud authentication via tokens, service accounts, etc. #181

Open juliasilge opened 1 year ago

juliasilge commented 1 year ago

In writing up the blog post for pins 1.1.0 in R, I ran into some challenges around authenticating for GCS in Python. I have a service account JSON file google-pins.json for a project "pins-dev" in my working directory.

I kind of expected that this might work, given what the docs for gcsfs and pins say (use a cached gcsfs token), but it does not:

import pins
import gcsfs

fs = gcsfs.GCSFileSystem(project="pins-dev", token="google-pins.json")
board = pins.board_gcs("pins-testing")
board.pin_read("small-numbers")

The pins functions don't work, even though fs.ls("pins-testing/") does.

This does work:

import pins
opts = {"cache_timeout": 0, "token": "google-pins.json"}
path = "pins-testing"
board = pins.board("gcs", path, storage_options=opts)
board.pin_read("nice-numbers")

Once I successfully read the pin this way, I can re-declare the board via pins.board_gcs("pins-testing") and still read the pin (which is cached locally), even though the board object is different.

🎯 Can/should we add a token argument to the GCS board? Should we do something similar for the other cloud boards? FWIW in R, we decided authentication was specific enough to these platforms that we needed to add individualized support in each board.

Also FWIW with GCS specifically, I'm still fuzzy on how the CLI authentication interacts with what I can do from Python. I did try authenticating via the CLI with gcloud auth application-default login and I'm not sure whether that was important.

cpcloud commented 1 year ago

IME it's best to delegate to the underlying libraries and if the underlying libraries are also hand-rolling APIs to their users, to help those libraries use the provider-implemented APIs to handle authentication and authorization.

Exposing authz to users almost always fails to account for the many ways credentials can be set.

cpcloud commented 1 year ago

FWIW I've had no problems authenticating to GCS using pins.

It's likely that gcsfs is using the Google-implemented authentication libraries, which will among other things look in the correct user-directories for credentials (which as you allude to are set up by running gcloud auth ...).

machow commented 1 year ago

I wonder if at the very least we include in the docstring of board() something similar to the example of what worked. That way, there's a quick escape hatch to manually passing arguments to the underlying fsspec.filesystem constructor.

Another kind of weird thing, that maybe we can nudge upstream in gcsfs on, is that AFAIK it respects the GOOGLE_APPLICATION_CREDENTIALS env var (though I haven't checked recently), but this isn't documented in the gcsfs docs (it's a behavior in the lower level google auth library?).