Open ggggxm opened 1 month ago
Epoch 101/1000 - steps: 10%|█████▌ | 100/1000 [10:17<1:32:37, 6.18s/it, avr_loss=0.268] Thread 74 "pt_main_thread" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffef1800640 (LWP 323521)]
I think the problem is due to my cuda version or env,can somebody succeed training tell me your environment?
My env is:
I have tried:
But the issue still exists, gdb output like this:
0 0x00007fff4402c640 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
1 0x00007fff43eedfe8 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
2 0x00007fff440544da in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
3 0x00007fff43f2cf56 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
4 0x00007fff43f2d667 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
5 0x00007fff43f30431 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
6 0x00007fff44132370 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
7 0x00007ffff5c1498c in ?? ()
from /home/baifeng/miniconda/envs/comfy/lib/python3.9/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12
8 0x00007ffff5c6bf5e in cudaLaunchKernel ()......
and the issue occurs in randow step(from 20 to 88......)