open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 381 forks source link

Add Resemblyzer for Speaker Similarity Evaluation & Bug fixes #75

Closed Merakist closed 8 months ago

Merakist commented 8 months ago

Description

This pull request includes several updates to the speaker similarity evaluation process in the Amphion project, addressing issues with counterintuitive results from the previous RawNet3 model by implementing calculation with Resemblyzer. Additional updates include bug fixes and enhancements for GPU support.

Objective

Testing

Changes

Usage

When calculating speaker similarity with Amphion/egs/metrics/run.sh, the user will be prompted to select a model (RawNet3/Resemblyzer). If Resemblyzer is selected, an overall similarity result will be printed in the terminal and per-utterance similarity results will be saved in a .csv file under the dump_dir.

Request

Requesting a review for the proposed changes and subsequent merge into the main branch.