Syncnet loss does not converge

kavita-gsphk commented 1 month ago

I am training syncnet on avspeech dataset with train_syncnet_sam.py . My training loss is stuck at 0.69 even after 500k steps. Lr and bs are 5e-5 and 64 , respectively.

I have gone through all the issues, but I haven't found any workable solution. If anyone has any suggestions, it will be a great help.

For preprocessing, I followed all the steps suggested here except the video split part. My videos average length is 7.1s (videos are in range 0-15s) and total length of training dataset is roughly 30.5hr

waptak commented 1 month ago

I have the same issue , it's always around 0.69

linqiu0-0 commented 1 month ago

How long does it take to train 500k steps?

openalldoors commented 1 month ago

需要更多高质量的训练集我降到0.37了

linqiu0-0 commented 1 month ago

需要更多高质量的训练集我降到0.37了

请问你用了多少数据，训练了多少batch，大概需要多久呀？

openalldoors commented 1 month ago

2万个不到5秒的视频文件跑了36万step 我中途修改了一下训练集增加了一些高质量的训练数据。按照作者的说法我的训练集数据可能还远远不够。我先试试吧毕竟炼丹靠玄学

linqiu0-0 commented 1 month ago

非常感谢！请问你的batch size是多少呢？是的炼丹靠玄学😂

openalldoors commented 1 month ago

16

linqiu0-0 commented 1 month ago

https://github.com/primepake/wav2lip_288x288/issues/30 那我觉得你和这个issue里面的loss progress还挺相似的, 希望就在眼前

kavita-gsphk commented 1 month ago

@linqiu0-0 it took around 4.6 days to finish 500k steps

kavita-gsphk commented 1 month ago

@openalldoors, which dataset are you using for high quality? So, are you saying if I include a more high-quality dataset, then the loss will converge?

openalldoors commented 1 month ago

@openalldoors, which dataset are you using for high quality? So, are you saying if I include a more high-quality dataset, then the loss will converge? 1、你可以从B站筛选到符合条件的视频，自己再处理，这会消耗大量的时间。 2、加入更多的高质量数据集，可以显著加速loss的收敛，前提是你的数据集确实是高质量，而不是你以为是高质量。

1129571 commented 1 month ago

@openalldoors, which dataset are you using for high quality? So, are you saying if I include a more high-quality dataset, then the loss will converge?，您使用哪个数据集来获得高质量？所以，你是说如果我包括一个更高质量的数据集，那么损失就会收敛？ 1、你可以从B站筛选到符合条件的视频，自己再处理，这会消耗大量的时间。 2、加入更多的高质量数据集，可以显著加速loss的收敛，前提是你的数据集确实是高质量，而不是你以为是高质量。

请教一下，您说的高质量数据集标准是什么，另外学习率是多少，感激不尽

openalldoors commented 1 month ago

视频的码率够不够声音是否同步更重要的是视频每一帧里面的脸有没有是不是同一个人，是不是有多人？

1129571 commented 1 month ago

视频的码率够不够声音是否同步更重要的是视频每一帧里面的脸有没有是不是同一个人，是不是有多人？

感谢，码率、人脸我都检查过，全1080p，音画syncnet_python检测我省略了，通过降低学习率过了0.69的坎。但现在训练很慢，160w steps才到了0.44左右，而且貌似有过拟合的趋势，看见你的恢复怀疑是数据集质量的问题，能请教一下您的音画同步步骤吗

openalldoors commented 1 month ago

音画syncnet检测别省除非你有百分之百的把握你可以试着看看eval的log 如果eval数据异常的话 loss值会异常显著大于1 （大于6也是有可能的）你需要去排查。

jibingyangsf commented 1 month ago

2万个不到5秒的视频文件跑了36万step 我中途修改了一下训练集增加了一些高质量的训练数据。按照作者的说法我的训练集数据可能还远远不够。我先试试吧毕竟炼丹靠玄学

请问作者这套源码不需要调整网络结构和损失函数就可以直接训练384吗？

jibingyangsf commented 1 month ago

视频的码率够不够声音是否同步更重要的是视频每一帧里面的脸有没有是不是同一个人，是不是有多人？

感谢，码率、人脸我都检查过，全1080p，音画syncnet_python检测我省略了，通过降低学习率过了0.69的坎。但现在训练很慢，160w steps才到了0.44左右，而且貌似有过拟合的趋势，看见你的恢复怀疑是数据集质量的问题，能请教一下您的音画同步步骤吗

这直接用syncnet_python 去跑一个开源项目 AV offset 0 就代表同步了。我也有个问题作者的源代码确定可以不用改就能跑288或者384 512的训练吗？不是说网络结构和损失函数都要和96*96 有区别吗？这里你懂不？

1129571 commented 1 month ago

音画syncnet检测别省除非你有百分之百的把握你可以试着看看eval的log 如果eval数据异常的话 loss值会异常显著大于1 （大于6也是有可能的）你需要去排查。

感谢分享经验，我目前是train0.42，eval0.45-0.43波动，因为显存小训练慢所以还不太好判断。

openalldoors commented 1 month ago

要看eval 每一条的输出看均值看不出问题来

xiao-keeplearning commented 2 weeks ago

syncnet 训练过拟合是什么原因呢？数据音画同步跑过检测没问题

primepake / wav2lip_288x288

Syncnet loss does not converge #146