Open MyBeautiful-Fantasy opened 2 months ago
Hi,
Thank you for the positive feedback! Always heartwarming to hear.
As to your question, I don't remember it as accurately as I would wish now (it's been a while back), I think I ran the Wav2Lip recipe for real videos on LRS2.
Basically, the Wav2Lip repo has two scripts for calculating the metrics: calculate_score_LRS
and calculate_score_real_videos
.
As far as I remember, when I used the LRS recipe on ground-truth videos of LRS3 it matched the results obtained by the real videos script. However the scripts executed on LRS2 led to disagreeing results. Since calculate_score_real_videos
adds preprocessing steps on top of what you can find in calculate_score_LRS
, it's likely to be yield the more accurate results, albeit more time-consuming as well.
Thank you very much for your reply. I understand.
I will try the second solution calculate_score_real_videos
.
By the way, do you think the result calculated using the first method (calculate_score_LRS
) is acceptable? Or is it reasonable to get accurate results in the LSR2 dataset?
Kindly Tips, when reproducing, I found that changing line 24
in ./configs/config.yaml
to audios_dir
will prevent the error "TypeError: LipVoicerDataset.init() got an unexpected keyword argument 'audio_dir'", which may be a spelling error. Thanks again to the author who proposed such excellent work :)
Hi,
Sorry for the delay in my response.
I think it would be ok to use calculate_score_LRS
as it is prescribed by the authors of Wav2Lip.
Thank you for pointing out the typo in the code, I'll fix it
Dear author,
I hope this message finds you well and that I’m not causing any inconvenience. I have what may be my final question for a while.
Could you kindly provide the complete config.yaml
file for the GRID dataset
?
Alternatively, is it sufficient to only modify the w_video
, w_asr
, and asr_start
parameters in the existing config.yaml file, while keeping the other configurations (e.g., [diffusion][T], [diffusion][beta_0]...) the same as in the config.yaml for LSR?
Thank you for your assistance!
Best regards
It is sufficient to change the values that you stated in the config file
Excellent work! Amazing LipVoicer!
I have a small question about the evaluation metric of sync: LSE-C and LSE-D.
In LIPVOICER: GENERATING SPEECH FROM SILENT VIDEOS GUIDED BY LIP READING, the LSE-C of GT LRS2 is 6.840 and the LSE-D is 7.194 (see Table 3 on Page 8). I actually measured the GT LRS2 result of LSE-C is 8.248 and the LSE-D is 6.258, following the evaluation guidance of Wav2Lip:
I found that this result (LSE-C is 8.248) is similar to the article Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert (see Table 1 on Page 7). BTW: My test object is the Test set part of the LRS2 dataset, a total of 1243. Did the author of LipVoicer fine-tune syncnet_v2.model (./syncnet_python/data/)?