My operating system is Ubuntu Linux 22.04
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
In order to get CUDA pytorch and CUDA under conda, I am using
Active State Python with the following configuration:
I am starting up with:
!/bin/sh
torchrun --nproc_per_node 1 example_instructions.py \
--ckpt_dir CodeLlama-7b-Instruct/ \
--tokenizer_path CodeLlama-7b-Instruct/tokenizer_model \
--max_seq_len 512 --max_batch_size 4
and torchrun is crashing over missing libraries.
Traceback (most recent call last):
File "/home/doug/.cache/activestate/cb772d80/usr/bin/torchrun", line 5, in
import torch.distributed.run
File "/home/doug/.cache/activestate/cb772d80/usr/lib/python3.10/site-packages/torch/init.py", line 191, in
_load_global_deps()
File "/home/doug/.cache/activestate/cb772d80/usr/lib/python3.10/site-packages/torch/init.py", line 153, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/doug/.cache/activestate/cb772d80/usr/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: libcufft.so.10: cannot open shared object file: No such file or directory
My operating system is Ubuntu Linux 22.04 $ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy In order to get CUDA pytorch and CUDA under conda, I am using Active State Python with the following configuration: I am starting up with:
!/bin/sh
torchrun --nproc_per_node 1 example_instructions.py \ --ckpt_dir CodeLlama-7b-Instruct/ \ --tokenizer_path CodeLlama-7b-Instruct/tokenizer_model \ --max_seq_len 512 --max_batch_size 4 and torchrun is crashing over missing libraries.
Traceback (most recent call last): File "/home/doug/.cache/activestate/cb772d80/usr/bin/torchrun", line 5, in
import torch.distributed.run
File "/home/doug/.cache/activestate/cb772d80/usr/lib/python3.10/site-packages/torch/init.py", line 191, in
_load_global_deps()
File "/home/doug/.cache/activestate/cb772d80/usr/lib/python3.10/site-packages/torch/init.py", line 153, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/doug/.cache/activestate/cb772d80/usr/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: libcufft.so.10: cannot open shared object file: No such file or directory