tracking requirements in `required_pkgs`

isabelizimm commented 1 year ago

This conversation is starting to get lost in #126, so bringing it over here :)

From @juliasilge

Both start out with a base level of just the packages directly required to make a prediction. This is some level of likely to work and possibly be enough, especially in R where updating to latest is basically always the right move. Then both R and Python will have an option to escalate to more robust package version tracking.

In R, we're going straight to renv, since that is the tool most people are familiar with for this type of task, a tool we have input into how it develops, etc. So there are two levels, both familiar to R users: only package names, plus opt in to full renv. In Python, the thing that is most equivalent to renv (pipfile.lock) can seem like overkill and may be less familiar to many practitioners. Instead we can use piptools to generate a requirements.txt that is pinned to specific versions and covers the whole dependency graph. So there are two levels here too, but they are different to be more comfortable for Python users: only package names, plus opt in to the piptools pinned requirements.

The general idea would be that instead of required_pkgs, there would be an argument called requirements or requirements_txt. The default would be what required_pkgs does currently: give the names of the minimal required packages to make predictions at a model's endpoint. There could be another argument that would make this minimal requirements be more robust. The top level requirements would include the version (ie, vetiver==0.1.8 and scikit-learn==1.2.0), and pip-tools would be used to find the second-level compatible version. (There is the issue with just doing pip freeze is that it will include everything in the environment, and maybe more annoyingly, is not a guarantee that the environment can be recreated.)

So,

my_vetiver_model.requirements

could output something like:

vetiver
scikit-learn

or something like below, where it is generated from a pinned vetiver==0.1.8 and scikit-learn==1.2.0:

...
requests==2.28.1
    # via
    #   pins
    #   vetiver
rfc3986[idna2008]==1.5.0
    # via httpx
rsconnect-python==1.13.0
    # via vetiver
scikit-learn==1.2.0
    # via
    #   -r /var/folders/5w/dhznpltj14n3nxr4fybjj8_w0000gn/T/tmp8p4nsqtj.in
    #   vetiver
scipy==1.9.3
    # via scikit-learn
...

CC: @machow @juliasilge

juliasilge commented 1 year ago

This is related to rstudio/vetiver-r#154

machow commented 1 year ago

Thanks for this! It feels like the end goal here can be tricky to parse from the language specific details here. For example, when comparing the R and python programs, I noticed for docker deployment, they differ how they pin package versions for users.

Can we write out somewhere the high-level rules, and cases they apply to (without any mention of technical solutions)? It'd help to hear the end result users should expect to see in terms of versions installed.

Here's a rough example (which may be wrong):

Rules for write_dockerfile

Rule: write_dockerfile by default pins the version of vetiver the user had installed when they called it.
- If a new version of vetiver is released and the dockerfile is rebuilt, it should install the version the user had originally.
Rule: the above applies to fastapi, vetiver, pins, and <stats_model_package>.
Rule: transitive dependencies and their versions should not change from build to build.
- if fastapi depends on <some_package>, then rebuilding the Dockerfile should use the same version for that package.
Rule: transitive dependencies do not need to be pinned to what the user currently has installed
- if a user has the transitive dep <some_package>==0.0.1, but <some_package>==0.0.2 is compatible, it's okay if a resolver pins to 0.0.2.

isabelizimm commented 1 year ago

Ah, that's a good way to establish what needs to happen! I think you have it mostly right, but this will mainly happen at writing/reading pins:

Rules for write_dockerfile

Purpose of this to check that the versions of the model package and vetiver are the same at pin read as when it was originally written to pin.

Rule: At vetiver_pin_write, the version of vetiver package the user has installed when they write the pin will be saved in metadata at requirements
- Rule: above applies to pins, fastapi, and <stats_model_package> packages
Rule: At VetiverModel.from_pin, a message is shown if any of the above packages do not match the version the user has installed at pin write.
Rule: transitive dependencies do not need to be pinned to what the user currently has installed
- if a user has the transitive dep <some_package>==0.0.1, but <some_package>==0.0.2 is compatible, it's okay if a resolver pins to 0.0.2.

This should be mostly invisible to users. If people are interested in looking at this file, they are able to do so via board.pin_meta.

rstudio / vetiver-python

tracking requirements in `required_pkgs` #140

Rules for write_dockerfile

Rules for write_dockerfile