Closed 944284742 closed 4 years ago
It seems the issue with faiss
, try to re-install it again with
conda install faiss-gpu cudatoolkit=10.0 -c pytorch
@944284742
Yes, I resolved this issue by moving from GPU to CPU during training
search_type: 3 # 0,1,2 for GPU, 3 for CPU (work for faiss)
I have the same problem, 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2, true, IndexType>&, bool, faiss::gpu::Tensor<float, 1, true, IndexType>&, bool, cudaStream_t) [with T = float; TVec = float4; IndexType = int; cudaStream_t = CUstream_st*] at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/gpu/impl/L2Norm.cu:323; details: CUDA error 209 no kernel image is available for execution on the device Aborted (core dumped)
Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm<T, TVec>(faiss::gpu::Tensor<T, 2, true, long long, faiss::gpu::traits::DefaultPtrTraits> &, bool, faiss::gpu::Tensor<float, 1, true, long long, faiss::gpu::traits::DefaultPtrTraits> &, bool, CUstream_st *) at D:/bld/faiss-split_1685360948441/work/faiss/gpu/impl/L2Norm.cu:300; details: CUDA error 209 no kernel image is available for execution on the device
Another solution that worked for me (although not for mmlab code) was to move to an older version of faiss. Specifically 1.6.5.
Thank you for your enthusiastic help, I seem to have seen the faiss readme file on github before, and I have to be under linux to solve this problem
At 2024-02-24 21:31:28, "Blaž Rolih" @.***> wrote:
Another solution that worked for me (although not for mmlab code) was to move to an older version of faiss. Specifically 1.6.5.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我的环境是ubuntu18.04, pytorch1.5.0 cuda10.1,运行时报错如下: 我执行的训练指令是: GPUS=1 bash dist_train.sh SpCL SpCL/Market1501
bruteForceKnn is deprecated; call bfKnn instead Faiss assertion 'err == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2, true, IndexType>&, bool, faiss::gpu::Tensor<float, 1, true, IndexType>&, bool, cudaStream_t) [with T = float; TVec = float4; IndexType = int; cudaStream_t = CUstream_st*] at gpu/impl/L2Norm.cu:292; details: CUDA error 11 invalid argument Traceback (most recent call last): File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main__", mod_spec) File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/my_app/anaconda3/envs/OpenUnReID-pytorch1.5-py3.6/bin/python', '-u', 'SpCL/main.py', 'SpCL/config.yaml', '--work-dir=SpCL/Market1501', '--launcher=pytorch', '--tcp-port=28211', '--set']' died with <Signals.SIGABRT: 6>.