piercelab / tcrmodel2

Apache License 2.0
28 stars 5 forks source link

Installation error #18

Open seanwangsalad opened 6 months ago

seanwangsalad commented 6 months ago

Ubuntu 22.04 CUDA 11.8 AF 2.3

Dear developers, When running the following command, I get the following errors: sudo singularity build tcrmodel2.sif singularity/tcrmodel2_singularity.def

Screenshot 2024-03-16 at 10 35 12 PM Screenshot 2024-03-16 at 10 31 13 PM

However it still builds thetcrmodel2.sif file.

However when I do: _bash run_tcrmodel2singularity.sh It returns the error: Traceback (most recent call last): File "/opt/tcrmodel2/run_tcrmodel2.py", line 9, in from absl import app, flags ModuleNotFoundError: No module named 'absl'

Would really appreciate some help in figuring out if this is an installation or run error.

rui-yin commented 6 months ago

Hi Sean,

Indeed, from the output you shared, it looks like the singularity container is not properly built (that's why you see the module not found error when running the bash script). The error message seems to have come from the cuda 23.5.0 not being compatible with python 3.12. However, in the tcrmodel2_singularity.def file, the Python version we specified is Python 3.10 (see this line).

Maybe you can double-check the Python version in the tcrmodel2_singularity.def file?

Best, Rui

seanwangsalad commented 6 months ago

Hi Rui, should I specify 3.12? It is 3.10 in the .def file. Error seems to be coming from pin-1

Thanks,

rui-yin commented 6 months ago

Hi Sean,

I think 3.10 is what you want, not 3.12. So what you have in the .def looks good. It's interesting that what you specify (python version 3.10) is different from what's being installed.

Maybe try changing line 38 from: conda install -qy conda==23.5.0 \

to: conda install -qy conda==23.5.0 python=3.10 \

and everything else stays the same.

Let me know how it goes!

Best, Rui

bsaleme commented 4 months ago

Hello Rui, I have also run into the same problem listed above. I have made the change you recommended (specifying the python version after conda version) and I successfully built the image but then I run into another issue with "jax.extend" module not being found. I added a picture of the error below. Any recommendations ?

Screen Shot 2024-05-02 at 2 59 48 PM
bpierce12 commented 4 months ago

@bsaleme it looks like your issue is the same as what others have noted for ColabFold, and is related to the jax version: https://github.com/YoshitakaMo/localcolabfold/issues/212 Downgrading to jaxlib to 0.4.23, as noted in that thread, will hopefully help with that problem.

andreas-wilm commented 2 months ago

Hi all,

I also ended in version dependency hell as well while upgrading the other day. The real culprit is that the definitions file always pulls the latest Alphafold commit. It's best to go with a certain commit/tag/release and then change library versions as needed. One (official) example is this GCP Dockerfile.

I ended up changing the TCRModel2 definitions file accordingly (conda 24.1.2, jax 0.4.13, cuda 11.8 and AF commit 032e2f2 from Feb 2024). Happy to share the file or issue a PR if of interest.

Andreas