Flux.1.dev.fp8 CKPT trainning the avr_loss keep 'nan'

Lecho303 commented 1 month ago

i was trying to trainning a lora which use flux.1.dev.fp8 CKPT,and the log keep telling me that avr_loss is nan,i do not know where i setting wrong or someting?

the system & version: [START] Security scan [DONE] Security scan

ComfyUI-Manager: installing dependencies done.

ComfyUI startup time: 2024-09-28 16:52:30.478163 Platform: Windows Python version: 3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)] Python executable: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\python_embeded\python.exe ComfyUI Path: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI Log path: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\comfyui.log

Prestartup times for custom nodes: 0.0 seconds: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI\custom_nodes\rgthree-comfy 0.0 seconds: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI\custom_nodes\ComfyUI-Easy-Use 4.2 seconds: D:\comfyUI\ComfyUI_windows_portable_nvidia.7z\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 6144 MB, total RAM 32461 MB pytorch version: 2.3.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 3060 Laptop GPU : cudaMallocAsync Using pytorch cross attention

屏幕截图 2024-09-29 142304

kijai commented 1 month ago

Not seen that happen myself, I'd recommend updating to torch 2.4.1 though, it's what kohya recommends to be used and it has solved lots of memory and speed issues for many who have updated.

Lecho303 commented 1 month ago

我自己没见过这种情况，但我建议更新到 torch 2.4.1，这是 kohya 建议使用的，它已经为许多更新过的人解决了许多内存和速度问题。

ok ,i will try to update,thank you so much

Lecho303 commented 1 month ago

Not seen that happen myself, I'd recommend updating to torch 2.4.1 though, it's what kohya recommends to be used and it has solved lots of memory and speed issues for many who have updated.

hi, i am upgrade the pytorch to 2.4.1,but the loss still keep "nan"……

Lecho303 commented 1 month ago

屏幕截图 2024-10-01 101021