Open tlatlbtle opened 5 years ago
The audio code is at https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/Audio2Face.ipynb.
We don't have the code for training as it was rather a pain to implement. We had to use matlab to take Joon et al's audio features which were then saved out to numpy and used to train the new model.
Hi, I use code here to extract audio feature: http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip. Download dataset here: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip And frames extracted at 1fps: http://www.robots.ox.ac.uk/~vgg/research/CMBiometrics/data/zippedFaces.tar.gz
I do not find "audio.wav" described on line 71 in "extract_audio_voxceleb.m" in test-set. It seems every wav file in test set has been split into several clips: 00001.wav, 00002.wav, etc.
Do I need to merge these clips to get "audio.wav"?
Hi, would you mind to tell us how to gain "audio.wav" mentioned in "extract_audio_voxceleb.m" (
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip)
line 71:
audio_file = ['/datasets/voxceleb1/wav/' track_name '/' track_id '/audio.wav'];
Hi! We didn't do the preprocessing, but yes I believe the preprocessing gave us an audio file corresponding to the entire video. This was because the frames we used were preprocessed according to the entire video as well I believe. I don't think merging the clips will work as this won't give the original video. What is probably easiest is to simply modify the script but ensure that the frames and the audio correspond (lines 82-84) using whatever preprocessed version of the data you have.
I just download original complete video in mp4 format from youtube and translate it into wav audio. It works for me. Thanks.
Thanks for this great repo!
As for audio2face, I found that in model files, it does not has audio embedding part: https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/NoSkipNet_X2Face_pose.py The default value for ''audio'' in line 197 is false, also there is not any codes for audio model. How to reimplement your method for audio2face?
Thanks.