sukun1045 / video-physics-sound-diffusion

Apache License 2.0
44 stars 3 forks source link

Does the training loss of Sound_residual_predict only have one kind of Mrft_loss? #4

Closed SuperiorDtj closed 8 months ago

SuperiorDtj commented 8 months ago

When I try to train sound_phsics_residual_predict , there is erro: The SoundResidualCriterion need four paras(gt/predict audio and gt/predict fea), but the trainer only pass it gt/predict audio to calculate Mrft_loss. Is the percept loss needing gt/predict fea necessary?

sukun1045 commented 8 months ago

Thanks for pointing it out. It was redundant code that I forgot to clean. I tried adding perceptual loss before but I remembered it didn't help. So feel free to delete it

SuperiorDtj commented 8 months ago

Thanks for pointing it out. It was redundant code that I forgot to clean. I tried adding perceptual loss before but I remembered it didn't help. So feel free to delete it

Thanks for your quick replying!

I have another questions: When extrating phsics paras, there are two number I cannot understand. One is frequency distance 10.76, and another is 60 / final_d. Is there any reference material to explain these specific numbers?

屏幕截图 2024-03-01 124634
sukun1045 commented 8 months ago

Sorry for the confusion, the first number is a hard-code for 44100/2048/2, giving a rough range of the frequency range within one bin. The second one refers to '60dB'. You can check out the code in TDW repo.

SuperiorDtj commented 8 months ago

Sorry for the confusion, the first number is a hard-code for 44100/2048/2, giving a rough range of the frequency range within one bin. The second one refers to '60dB'. You can check out the code in TDW repo.

Thank you for your prompt and detailed reply!!