Closed huaerli closed 5 years ago
So we used the frames pre-extracted by the authors. For Vox1 we used the frames extracted at 1fps from here http://www.robots.ox.ac.uk/~vgg/research/CMBiometrics/. And for Vox2 we used the bounding box coordinates for the cropped frames. The detections are not the same so we had to recrop both datasets by a standard crop in order to make them match. Unfortunately we used the frames pre-extracted by the authors for both these datasets so we don't have any scripts for cropping/saving.
For Vox2 (for example) we recropped using: Compose([Scale((256,256)), Pad((20,80,20,30)), CenterCrop(precrop), Scale((256,256))]).
Thanks for your reply! I am just still having a bit questions about the pre-extracted frames for VoxCeleb2. Is there any place that I could find this source online? Thank you so much!
Hi. What do you mean? As in how we cropped Vox1? (We always used the same crop.)
I mean that where can I find the pre-extracted frames for VoxCeleb2? Is this dataset available online?
I don't think you can find the pre-extracted frames. I think you have to download it yourself. And then do the cropping unfortunately.
What does voxceleb2 header fields mean?
Offset : -2
FV Conf : 16.303 (1)
ASD Conf : 6.201
I am having some issues on trying to train my own model. In the readme file, it is suggested to download the dataset in a certain formats. However, since the VoxCeleb dataset only provides video/ url to youtube videos, I am not quite clear on what exact format the training data used for this model. Is it just the cropped frame using given bounding box coordinates and frame number at 25fps? Is it possible for you to provide the scripts that you used for cropping frames and saving it? Thanks.