Open matthewfeickert opened 3 years ago
Yep! I had to manually install the correct version of pyTorch and CUDA to fix this but it slipped my mind to change the requirements.txt. Since the project is stable in the newer version, I can just update the requirements.txt itself.
Since the project is stable in the newer version, I can just update the requirements.txt itself.
So the tricky part here though is that we should avoid locking in general requirements.txt
to a specific machine. As torch==1.9.0+cu111
is making assumptions about the CUDA libraries that are available it will probably be required to either have machine specific environment files (e.g. dgx-requirements.txt
or dgx-env.yml
). Optionally we could create some setup scripts that do some looking for users in the environments they have, but in my personal experience (https://github.com/matthewfeickert/nvidia-gpu-ml-library-test) this is gets tedious and hard to do robustly.
This will eventually become more apparent as we make a deepmem
library and then we'll have library dependencies vs. runtime application dependencies (c.f. https://caremad.io/posts/2013/07/setup-vs-requirement/). The library dependencies will give us minimum required APIs for things to work as expected (e.g. torch>=1.8.0
) and then our runtime dependencies will define the actual "application" environment (e.g. torch==1.9.0+cu111
).
At the moment we've basically be treating everything as application dependencies.
If the current environment is installed on DGX in a clean virtual environment
https://github.com/mihirkatare/DeepMEM/blob/61e5d7ef9e9f097f13ee4e98d55a6611d76cd4c4/requirements.txt#L5
then the user will get
torch
v1.8.1
— that's fine by itself. However, if the user checks the compatibility with the GPUs availableTo get a PyTorch release that is compatible with the A100 the user needs to get one of the custom wheels that PyTorch hosts built with CUDA 11
None of that is a big problem, but it might require either a
dgx-requirements.txt
or some instructions for the user to manually resolve things.