v-iashin / VoxCeleb

An attempt to replicate the results of [1706.08612] VoxCeleb: a large-scale speaker identification dataset
12 stars 3 forks source link

about the iden_split.txt #1

Closed hktxt closed 5 years ago

hktxt commented 5 years ago

Hi~ your code is really helpful. could you please tell me more about the iden_split.txt? is it a text file that contains file paths, one row one path?

v-iashin commented 5 years ago

Hey!

I am really glad that you found my code to be helpful.

Regarding your question, I think you are correct, one row one path.

iden_split.txt is the file that VGG provided with the dataset: VoxCeleb1 look for Dataset split for Identification. Also, you may take a look at preprocessing.ipynb notebook which processes the raw downloaded files.

It appears to me that this file consists of a phase (train: 1 and 2; test: 3) and an audio path (of a format: id/track/segment). To verify the fact that the first column is a phase, you may count the number of rows and compare it with the values that are mentioned in the paper in Table 5. Also, it is a reasonable assumption as identification is

identification is treated as a simple classification task, the output of the last layer is fed into a 1,251-way softmax in order to produce a distribution over the 1,251 different speakers.

hktxt commented 5 years ago

@vdyashin I just realized that it was provided by VGG. thanks~ anyway~ more help will be asked for if I get stucked~~~hahaha