oawiles / X2Face

Pytorch code for ECCV 2018 paper
MIT License
247 stars 59 forks source link

How to train model with audio feature? #27

Open tlatlbtle opened 5 years ago

tlatlbtle commented 5 years ago

Thanks for this great repo!

As for audio2face, I found that in model files, it does not has audio embedding part: https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/NoSkipNet_X2Face_pose.py The default value for ''audio'' in line 197 is false, also there is not any codes for audio model. How to reimplement your method for audio2face?

Thanks.

oawiles commented 5 years ago

The audio code is at https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/Audio2Face.ipynb.

We don't have the code for training as it was rather a pain to implement. We had to use matlab to take Joon et al's audio features which were then saved out to numpy and used to train the new model.

tlatlbtle commented 5 years ago

Hi, I use code here to extract audio feature: http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip. Download dataset here: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip And frames extracted at 1fps: http://www.robots.ox.ac.uk/~vgg/research/CMBiometrics/data/zippedFaces.tar.gz

I do not find "audio.wav" described on line 71 in "extract_audio_voxceleb.m" in test-set. It seems every wav file in test set has been split into several clips: 00001.wav, 00002.wav, etc.

Do I need to merge these clips to get "audio.wav"?

tlatlbtle commented 5 years ago

Hi, would you mind to tell us how to gain "audio.wav" mentioned in "extract_audio_voxceleb.m" ( http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip) line 71: audio_file = ['/datasets/voxceleb1/wav/' track_name '/' track_id '/audio.wav'];

oawiles commented 5 years ago

Hi! We didn't do the preprocessing, but yes I believe the preprocessing gave us an audio file corresponding to the entire video. This was because the frames we used were preprocessed according to the entire video as well I believe. I don't think merging the clips will work as this won't give the original video. What is probably easiest is to simply modify the script but ensure that the frames and the audio correspond (lines 82-84) using whatever preprocessed version of the data you have.

tlatlbtle commented 5 years ago

I just download original complete video in mp4 format from youtube and translate it into wav audio. It works for me. Thanks.