Open HUIZ-A opened 5 months ago
Yes. It has been trained. You should use the pretrained weight.
Yes. It has been trained. You should use the pretrained weight.
Is the pretrained weight provided? It seems not be uploaded at https://huggingface.co/SimianLuo/Diff-Foley/tree/main/diff_foley_ckpt.
Yes. It has been trained. You should use the pretrained weight. @luosiallen
I think I've figured it out, the eval_classifier.ckpt includes the params of classifier backbone and video_feat_encoder, but the first stage model is not included. Is that right?
In "Diff-Foley/diff_foley/modules/double_guidance/alignment_classifier_metric.py", the first_stage_ckpt is individually loaded: `class Alignment_Classifier_metric(pl.LightningModule):
def __init__(self,
classifier_config,
first_stage_config,
cond_stage_config,
monitor,
first_stage_ckpt=None,
first_stage_key="spec",
scale_factor = 1.0,
timesteps = 2,
given_betas=None,
beta_schedule = "linear",
linear_start=1e-4,
linear_end=2e-2,
cosine_s=8e-3,
v_posterior=0.,
parameterization="eps",
*args, **kwargs):
super().__init__()
self.instantiate_first_stage(first_stage_config)
self.first_stage_ckpt = first_stage_ckpt
if self.first_stage_ckpt is not None:
self.init_first_from_ckpt(self.first_stage_ckpt)`
Hello! May I ask what is the result of your evaluation? After using "video_feat_encoder. py", almost all values are close to 0. If not used, the accuracy of evaluating vggsound is 0.16. Obviously, this is not correct~ T T
Hello! May I ask what is the result of your evaluation? After using "video_feat_encoder. py", almost all values are close to 0. If not used, the accuracy of evaluating vggsound is 0.16. Obviously, this is not correct~ T T
@Angelalilyer my acc is about 0.8 for my variant model and a 100 samples eval subset,
Thanks for your works! I'm confusing about the video_feat_encoder class, which is used in "Diff-Foley/evaluation/config/eval_classifier.yaml" for evaluation. This encoder is a nn.module, operating nn.linear to change the tensor shape. I'm wondering whether this video_feat_encoder had been trained or the embedding network params is not quite neccessary so I can just use the initialized params