oawiles / FAb-Net

Pytorch code for BMVC 2018 paper
MIT License
86 stars 15 forks source link

Extracting Frames from VoxCeleb Video Dataset #2

Closed huaerli closed 5 years ago

huaerli commented 5 years ago

I am having some issues on trying to train my own model. In the readme file, it is suggested to download the dataset in a certain formats. However, since the VoxCeleb dataset only provides video/ url to youtube videos, I am not quite clear on what exact format the training data used for this model. Is it just the cropped frame using given bounding box coordinates and frame number at 25fps? Is it possible for you to provide the scripts that you used for cropping frames and saving it? Thanks.

oawiles commented 5 years ago

So we used the frames pre-extracted by the authors. For Vox1 we used the frames extracted at 1fps from here http://www.robots.ox.ac.uk/~vgg/research/CMBiometrics/. And for Vox2 we used the bounding box coordinates for the cropped frames. The detections are not the same so we had to recrop both datasets by a standard crop in order to make them match. Unfortunately we used the frames pre-extracted by the authors for both these datasets so we don't have any scripts for cropping/saving.

For Vox2 (for example) we recropped using: Compose([Scale((256,256)), Pad((20,80,20,30)), CenterCrop(precrop), Scale((256,256))]).

huaerli commented 5 years ago

Thanks for your reply! I am just still having a bit questions about the pre-extracted frames for VoxCeleb2. Is there any place that I could find this source online? Thank you so much!

oawiles commented 5 years ago

Hi. What do you mean? As in how we cropped Vox1? (We always used the same crop.)

huaerli commented 5 years ago

I mean that where can I find the pre-extracted frames for VoxCeleb2? Is this dataset available online?

oawiles commented 5 years ago

I don't think you can find the pre-extracted frames. I think you have to download it yourself. And then do the cropping unfortunately.

mrgloom commented 5 years ago

What does voxceleb2 header fields mean?

    Offset    :     -2
    FV Conf   :     16.303  (1)
    ASD Conf  :     6.201