I'm reaching out to share a potential issue. While I've managed to resolve it on my end, others following the README's setup instructions might run into it.
Here's my setup:
OS: Ubuntu 22.04
CUDA: 11.8
After setting up via poetry as outlined in the README and running ./script/run.sh, I ran into the following error:
Traceback (most recent call last):
File "~/heron/.venv/bin/deepspeed", line 3, in <module>
from deepspeed.launcher.runner import main
File "~/heron/.venv/lib/python3.10/site-packages/deepspeed/__init__.py", line 10, in <module>
import torch
File "~/heron/.venv/lib/python3.10/site-packages/torch/__init__.py", line 229, in <module>
from torch._C import * # noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
I noticed the README mentions the expected CUDA version as 11.7, which suggests that using 11.8 might not be ideal. Given this, I reinstalled pytorch with:
This fixed the issue and ./script/run.sh ran without any hitches. I've documented this to help anyone who might face this in the future.
If it helps, I'm happy to submit a pull request updating the pyproject.toml. If this isn't the right place for such feedback, please feel free to close this issue.
@Topology1225
Thank you for conducting the operational check and providing a detailed report! I truly appreciate you sharing such valuable insights. It would be wonderful if you could submit a pull request.
I'm reaching out to share a potential issue. While I've managed to resolve it on my end, others following the README's setup instructions might run into it.
Here's my setup:
After setting up via poetry as outlined in the README and running
./script/run.sh
, I ran into the following error:I noticed the README mentions the expected CUDA version as 11.7, which suggests that using 11.8 might not be ideal. Given this, I reinstalled pytorch with:
This fixed the issue and
./script/run.sh
ran without any hitches. I've documented this to help anyone who might face this in the future.If it helps, I'm happy to submit a pull request updating the pyproject.toml. If this isn't the right place for such feedback, please feel free to close this issue.
Thank you.