kijai / ComfyUI-FluxTrainer

Apache License 2.0
511 stars 27 forks source link

Segmentation fault,Core dumped #85

Open ggggxm opened 1 month ago

ggggxm commented 1 month ago

My env is:

I have tried:

But the issue still exists, gdb output like this:

0 0x00007fff4402c640 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

1 0x00007fff43eedfe8 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

2 0x00007fff440544da in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

3 0x00007fff43f2cf56 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

4 0x00007fff43f2d667 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

5 0x00007fff43f30431 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

6 0x00007fff44132370 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1

7 0x00007ffff5c1498c in ?? ()

from /home/baifeng/miniconda/envs/comfy/lib/python3.9/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12

8 0x00007ffff5c6bf5e in cudaLaunchKernel ()......

and the issue occurs in randow step(from 20 to 88......)

ggggxm commented 1 month ago

Epoch 101/1000 - steps: 10%|█████▌ | 100/1000 [10:17<1:32:37, 6.18s/it, avr_loss=0.268] Thread 74 "pt_main_thread" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffef1800640 (LWP 323521)]

I think the problem is due to my cuda version or env,can somebody succeed training tell me your environment?