hechmik / voxceleb_enrichment_age_gender

Code and data repository for paper "VoxCeleb enrichment for Age and Gender recognition" submitted at ASRU 2021
MIT License
63 stars 14 forks source link

input dimension #2

Open zhangshaohu opened 2 years ago

zhangshaohu commented 2 years ago

Hello!

ASVTorch generates 24 MFCCs, so the MFCCS are (n, 24) shape. Your input is (200, 30). Where is the 30 from? Can you please provide some test samples?

hechmik commented 2 years ago

Hi! The 30 comes from the number of Mel bins and ceps specified in the MFCC.conf file used by Kaldi https://gitlab.com/ville.vestman/asvtorch/-/blob/master/asvtorch/recipes/voxceleb/xvector/configs/mfcc.conf.

Regarding the test sample ,unfortunately the answer is no. The reason why is that the original dataset comes from YT videos and there are various copyright issues that may arise (also, the original VoxCeleb team should be, imo, the one to provide the raw tracks and devise appropriate sharing rules in their licence). We have, however, provided the list of recordings we used for train and test, therefore it should be possible to replicate it by following all the steps described in the paper and in the various notebooks

zhangshaohu commented 2 years ago

Thank you for your immediate response. I experienced some errors using ASVtorch so I used Kaldi. The original Kaldi for vector was num-ceps=24 https://github.com/kaldi-asr/kaldi/blob/master/egs/voxceleb/v1/conf/mfcc.conf I will update the value of num-ceps. Yes, Vox data can be requested. I think it is okay if you put several computed features for testing. In this case, somebody would like to replicate your code who only use a simple test example.