Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
This pull request includes several updates to the speaker similarity evaluation process in the Amphion project, addressing issues with counterintuitive results from the previous RawNet3 model by implementing calculation with Resemblyzer. Additional updates include bug fixes and enhancements for GPU support.
Objective
These changes aim to improve the accuracy of speaker similarity evaluations by further implementing Resemblyzer as an additional reference to the current RawNet3 model.
The current speaker_similarity.py compares the average characteristics of all files in one directory against the average characteristics of all files in the other directory.
The new resemblyzer_similarity.py performs detailed comparisons between individual files across the two directories using Resemblyzer before calculating the average, yielding more accurate results.
Testing
The Resemblyzer similarity calculations have been tested and validated for satisfactory results. For a comparison between RawNet3 and Resemblyzer model see Similarity Evaluation - Resemblyzer - RawNet3.
Changes
Amphion/bins/calc_metrics.py:
Fixed missing "fs" argument in line 160.
Added functionality to select between RawNet3 and Resemblyzer models for speaker similarity calculations.
Amphion/egs/metrics/run.sh:
Added support for automatic GPU allocation for calculating metrics. The script now detects a free GPU and allocates it for model processing.
New script added for computing speaker similarity using the Resemblyzer model.
Amphion/env.sh:
Included Resemblyzer as a new environment dependency.
Usage
When calculating speaker similarity with Amphion/egs/metrics/run.sh, the user will be prompted to select a model (RawNet3/Resemblyzer). If Resemblyzer is selected, an overall similarity result will be printed in the terminal and per-utterance similarity results will be saved in a .csv file under the dump_dir.
Request
Requesting a review for the proposed changes and subsequent merge into the main branch.
Description
This pull request includes several updates to the speaker similarity evaluation process in the Amphion project, addressing issues with counterintuitive results from the previous RawNet3 model by implementing calculation with Resemblyzer. Additional updates include bug fixes and enhancements for GPU support.
Objective
speaker_similarity.py
compares the average characteristics of all files in one directory against the average characteristics of all files in the other directory.resemblyzer_similarity.py
performs detailed comparisons between individual files across the two directories using Resemblyzer before calculating the average, yielding more accurate results.Testing
Changes
Amphion/bins/calc_metrics.py:
Amphion/egs/metrics/run.sh:
Amphion/evaluation/metrics/similarity/resemblyzer_similarity.py:
Amphion/env.sh:
Usage
When calculating speaker similarity with
Amphion/egs/metrics/run.sh
, the user will be prompted to select a model (RawNet3/Resemblyzer). If Resemblyzer is selected, an overall similarity result will be printed in the terminal and per-utterance similarity results will be saved in a.csv
file under thedump_dir
.Request
Requesting a review for the proposed changes and subsequent merge into the main branch.