nttcslab / byol-a

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
https://arxiv.org/abs/2103.06695
Other
204 stars 35 forks source link

How to interpret the performance #2

Closed ranchlai closed 3 years ago

ranchlai commented 3 years ago

Hi, it' s a great work, but how can I understance the performance metric? For example, VoxCeleb1 is usually for speaker verification, shouldn't we measure EER?

daisukelab commented 3 years ago

Hi @ranchlai,

Thank you for your question. The quick answer would be that the accuracy is reported in the previous studies, so we followed.

And as I googled and read some speaker recognition papers, I understood that the EER results are usually reported in that field. I'd like to try to have some time calculating the numbers.

Please let me keep this issue open so far. I would come back soon, hopefully with the results.

daisukelab commented 3 years ago

The VoxCeleb1 has two tasks. One is the identification, which we have tested in the paper, and another is the verification. So you might have been confused with the verification task, but our results tested identification. And all our results are top-1.

I hope this answers your question. Testing on the speaker verification task could be future work.

Anyway thank you for your question, I've learned about tasks.