Closed 21-10-4 closed 9 months ago
I have the same issue. I'm using Tiktok dataset.
I also use Tiktok dataset
What's your learning rate?
Try decreasing the learning rate to something like 5.e-6
Try decreasing the learning rate to something like 5.e-6
I have tried 1e-4, 1e-5, and 5.e-6. None of them work... Does anyone have any idea?
Are you using the latest training code?
Have you checked if the submitted code can train correctly? I've been using the latest code, but the loss consistently fails to decrease. Is it possible that some code has not been submitted? @guoqincode
Have you checked if the submitted code can train correctly? I've been using the latest code, but the loss consistently fails to decrease. Is it possible that some code has not been submitted? @guoqincode
I had no problem training on 8*A100.
Can you share the loss curve and some sample results?
On Mon, 18 Dec 2023, 12:37 qguopku, @.***> wrote:
Have you checked if the submitted code can train correctly? I've been using the latest code, but the loss consistently fails to decrease. Is it possible that some code has not been submitted? @guoqincode https://github.com/guoqincode
I had no problem training on 8*A100.
— Reply to this email directly, view it on GitHub https://github.com/guoqincode/AnimateAnyone-unofficial/issues/24#issuecomment-1859666477, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEVOQCZS7BZ55XLPV6Z3FITYJ7TTVAVCNFSM6AAAAABAWB3ZBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGY3DMNBXG4 . You are receiving this because you commented.Message ID: @.***>
Can you share the loss curve and some sample results? … On Mon, 18 Dec 2023, 12:37 qguopku, @.> wrote: Have you checked if the submitted code can train correctly? I've been using the latest code, but the loss consistently fails to decrease. Is it possible that some code has not been submitted? @guoqincode https://github.com/guoqincode I had no problem training on 8A100. — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEVOQCZS7BZ55XLPV6Z3FITYJ7TTVAVCNFSM6AAAAABAWB3ZBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZGY3DMNBXG4 . You are receiving this because you commented.Message ID: **@.***> My current machine does not have access to external networks. I will organize the current repo after all the models are trained.
Why my loss is quite strange today, I try the new code, and my loss gets NaN: