luosiallen / Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Apache License 2.0
147 stars 15 forks source link

TypeError: __init__() missing 1 required positional argument: 'first_stage_config' #25

Closed Angelalilyer closed 3 months ago

Angelalilyer commented 3 months ago

”Diff-Foley/evaluation/config/eval_classifier.yaml“ missed a section of content, about 'first_stage_config'

HUIZ-A commented 3 months ago

I've met the same issue. The classifier_acc evaluation doesn't work without 'first_stage_config'. And "Diff-Foley/inference/config/Double_Guidance_Classifier.yaml" also missed the 'first_stage_config', so the inference demo doesn't work either.

HUIZ-A commented 3 months ago

By the way, in the ”Diff-Foley/evaluation/config/eval_classifier.yaml“ or "Diff-Foley/inference/config/Double_Guidance_Classifier.yaml": """ ... cond_stage_config: target: diff_foley.modules.cond_stage.video_feat_encoder.Video_Feat_Encoder_Posembed params: origin_dim: 512 embed_dim: 512 seq_len: 40 part Video_Feat_Encoder_Posembed ... """ The checkpoint of this Video_Feat_Encoder_Posembed(super nn.module) is not provided, which is indispensable for inference and evaluation.

luosiallen commented 3 months ago

By the way, in the ”Diff-Foley/evaluation/config/eval_classifier.yaml“ or "Diff-Foley/inference/config/Double_Guidance_Classifier.yaml": """ ... cond_stage_config: target: diff_foley.modules.cond_stage.video_feat_encoder.Video_Feat_Encoder_Posembed params: origin_dim: 512 embed_dim: 512 seq_len: 40 part Video_Feat_Encoder_Posembed ... """ The checkpoint of this Video_Feat_Encoder_Posembed(super nn.module) is not provided, which is indispensable for inference and evaluation.

https://github.com/luosiallen/Diff-Foley/blob/main/diff_foley/modules/cond_stage/video_feat_encoder.py

luosiallen commented 3 months ago

Thanks for mentioning. I fill the 'first_stage_config'

HUIZ-A commented 3 months ago

By the way, in the ”Diff-Foley/evaluation/config/eval_classifier.yaml“ or "Diff-Foley/inference/config/Double_Guidance_Classifier.yaml": """ ... cond_stage_config: target: diff_foley.modules.cond_stage.video_feat_encoder.Video_Feat_Encoder_Posembed params: origin_dim: 512 embed_dim: 512 seq_len: 40 part Video_Feat_Encoder_Posembed ... """ The checkpoint of this Video_Feat_Encoder_Posembed(super nn.module) is not provided, which is indispensable for inference and evaluation.

https://github.com/luosiallen/Diff-Foley/blob/main/diff_foley/modules/cond_stage/video_feat_encoder.py Hi there, thanks for your works, it's very helpful! Do you mean the checkpoint of the nn.Linear is not neccessary??