Open jlian2 opened 4 years ago
A naive question. To my knowledge, the number of neurals in the last layer should be number of indentities. (e.g. 1000) But there are only 2(bonafida and spoof) for any speakers? What is the reason/superiority of doing this? Thanks!
Hi, this work is to perform binary classification of whether a speech file is computer-generated or human-spoken. There are only two classes.
A naive question. To my knowledge, the number of neurals in the last layer should be number of indentities. (e.g. 1000) But there are only 2(bonafida and spoof) for any speakers? What is the reason/superiority of doing this? Thanks!