Closed isabelizimm closed 1 year ago
Thanks for raising--can you clarify what you mean by requirements? The requirement in pins is that this code returns a backend:
import fsspec
# gs for google cloud storage, but swap in protocol for any target backend
fsspec.filesystem("gs")
We test against specific versions (gcsfs, s3fs, etc..), but these packages don't need to be installed to run (for example, you don't need s3fs to connect to S3, just some package that provides the fsspec entrypoint for s3).
It seems like there are two key differences from pins R:
We could include the names of the packages we use / test against, but it seems like at that point, it might be better to keep in vetiver? Or we could store it in pins options.extra_requires in setup.cfg, and read that from the metadata?
Alternatively, we can do some metadata digging to figure it out, based on the package a fsspec backend object comes from, but it could get gnarly.
For example, suppose I run pip install plum-dispatch
. This gives me a package called plum
. I can sort of reverse from the name "plum" back to plum-dispatch distribution info:
from importlib import metadata
# "plum-dispatch"
dist_name = metadata.packages_distributions()["plum"][0]
meta = metadata.distribution(dist_name)
meta.files
However, because the distribution may have come from somewhere besides PyPI, it's not really guaranteed to be pip installable. If you pip install a package from github, inside the meta.files
you'll see a file called direct_url.json
that describes where it came from. But I have no idea how consistent that is across tools etc... (and could be very wrong about all of this).
edit: If you want install recommendations, fsspec.registry.known_implementations
seems like a good place to look?
Thanks for this very comprehensive write-up!
can you clarify what you mean by requirements?
So, the particular instance that inspired this is: when vetiver
creates a Dockerfile that connects to a pins board, it generates a requirements.txt for all the necessary packages to communicate to that board and deploy the stored model from the vetiver model's required_pkgs
metadata. Right now, the requirements.txt will include pins
, but not, say, gcsfs
, since it is unknown what packages are necessary to authenticate to the board.
When the Dockerfile is run without gcsfs
in its requirements.txt, users receive errors saying they need to install gcsfs
. I was hoping we would be able to have something like board.required_pkgs
that would include gcsfs
to install this package without users needing to manipulate the requirements.txt themselves.
edit: If you want install recommendations, fsspec.registry.known_implementations seems like a good place to look?
The known_implementations
was lead was super helpful. It looks like boards from board.fs.protocol
could be matched up to fsspec.registry.known_implementations
pretty easily! I'm happy to add this into vetiver if it seems to niche of a use case to have in pins.
I tried out a possible implementation in the vetiver package, and it doesn't feel too clunky over there! It is pretty similar to board_deparse
, but returning values from fsspec.registry.known_implementations
. I'm okay closing this out and having the functionality in vetiver instead :D
Closing this, as the change will be made in vetiver.
👋 When using pins + vetiver to deploy a Dockerfile, it is necessary to collect all the requirements to authenticate to a board. Right now, it is not possible to determine what packages are used provide that authentication.
Some more context from: https://github.com/rstudio/vetiver-python/issues/165