Create environment lock file for more reproducible deploys

matthewfeickert commented 2 years ago

This project fully qualifies as a Python application in my mind and so we should do a better job of fully specifying the runtime environment than the existing requirements.txt files (@BenGalewsky has brought this up before). Related to https://github.com/matthewfeickert/distributed-inference-with-pyhf-and-funcX/pull/23#issuecomment-1105804742 I'm not 100% sure on how the deployment to RIVER works, but for the tests on EXPANSE this

https://github.com/matthewfeickert/distributed-inference-with-pyhf-and-funcX/blob/91d231e2b5e1a8f4fe1142472b82e1885a19cd4e/expanse-environment.yml#L5-L12

isn't truly reproducible. Something I've found to work really well for:

https://github.com/illinois-mla/phys-398-mla-image
https://github.com/recast-hep/recast-atlas
https://gitlab.cern.ch/illinois/2d-limit-interpolation (CERN internal resource sorry)

is to use pip-tools to create a lock file from a high level requirements.txt file and then to use Brett Cannon's "pip-secure-install" recommendations to make things as reproducible as possible with pip.

This works well as you're pinning down to the hash level of the wheel on PyPI, so if it ever gets removed or an additional wheel (maliciously) gets added you would know. However, @astrojuanlu (:wave: hey Juan) recently mentioned on Twitter that pip-compile doesn't seem to work that great when combined with Conda. In the replies people mentioned that conda-lock (https://github.com/conda-incubator/conda-lock) seems to work well, so for deploys with Conda this might be the most sensible way forward (though that requires building a second lock file I guess?).

BenGalewsky commented 2 years ago

On River we use the Dockerfile to run the endpoint in Kubernetes

matthewfeickert commented 2 years ago

On River we use the Dockerfile to run the endpoint in Kubernetes

Thanks @BenGalewsky. This is good in the sense that it will make it very easy to have a lock file where it will become clear from the outside what exactly is deployed in terms of the environment.

One more question: Can you remind me how the Docker image actually gets built? Does it get built on demand from the Dockerfile and then cached on a local container registry? Or is there a remote public container registry that it is pulling from?

BenGalewsky commented 2 years ago

Can you remind me how the Docker image actually gets built?

It's just me doing it manually when I feel like it!

matthewfeickert commented 2 years ago

It's just me doing it manually when I feel like it!

Thanks. This all works super well with a lock file as I can have pip-compile compile the dependencies against the CPython version in the Docker image itself. I'll PR soon!

matthewfeickert / distributed-inference-with-pyhf-and-funcX

Create environment lock file for more reproducible deploys #25