Closed simajiucai closed 1 year ago
similar issue I am facing.
ImportError: /root/.cache/torch_extensions/py38_cu102/transformer_inference/transformer_inference.so: cannot open shared object file: No such file or directory
were you able to resolve?
This problem could be because the extensions folder is located in /root
, which is privileged. Can you try using /tmp
instead by setting export TORCH_EXTENSIONS_DIR=/tmp
similar issue I am facing.
ImportError: /root/.cache/torch_extensions/py38_cu102/transformer_inference/transformer_inference.so: cannot open shared object file: No such file or directory
were you able to resolve?
Same as above issues, I check the DeepSpeed with ds_report and maybe you should install DeepSpeed with pre-install ops, not jit mode.
similar issue I am facing.
ImportError: /root/.cache/torch_extensions/py38_cu102/transformer_inference/transformer_inference.so: cannot open shared object file: No such file or directory
were you able to resolve?
Same as above issues
@feimadecaogaozhi, did you try changing the extensions folder as suggested above?
similar issue I am facing.
ImportError: /root/.cache/torch_extensions/py38_cu102/transformer_inference/transformer_inference.so: cannot open shared object file: No such file or directory
were you able to resolve?
Maybe you can check whether you have install transformers both use 'pip' and "conda"; “中文说明一下,如果是多卡训练,pip和conda分别安装了transformer,会导致冲突发生,但是在单卡上可能不会遇到这种问题”
I also have meet the problem, and then I find this is because gcc in the computer in lower than 5.0.0, when promot the gcc version, then I solved the problem. "可能是因为linux系统上gcc的版本太低,不支持deepspeed所需要的及时编译所需要的参数配置,将gcc版本提高不低于版本5就行了"
I also have meet the problem, and then I find this is because gcc in the computer in lower than 5.0.0, when promot the gcc version, then I solved the problem. "可能是因为linux系统上gcc的版本太低,不支持deepspeed所需要的及时编译所需要的参数配置,将gcc版本提高不低于版本5就行了"
don't work for me
when using accelerate,it will start multiprocess, and they all triggle JIT compile,cause this issue; we can triggle deepspeed JIT compile before running task:
python -c "from deepspeed.ops.op_builder import UtilsBuilder;UtilsBuilder().load()"
This problem could be because the extensions folder is located in
/root
, which is privileged. Can you try using/tmp
instead by settingexport TORCH_EXTENSIONS_DIR=/tmp
I try this, it's work
Closing as it seems a solution was found.
I am trying to use
Accelerate
andDeepspeed
for training, but I encountered the following error:My
Accelerate
config:and my
ds_report
:Here is the script that you can run it directly by
accelerate launch --mixed_precision="fp16" train_toy.py
:The complete error message is: