Open mustass opened 1 year ago
About sample rate sensitivity:
- Do we still want to do this for both datasets?
Nope, just for english for now.
- With which models do we want to test this?, only the pretrained or with some finetuned models too?
We don't finetune on the different SR for now. Just the pretrained.
Can we find another english speech dataset? not LJ? @sandorfoldi @panosapos
Experiments
We have two tracks:
1. Language sensitivity [DA-ENG]
Setup: Here we begin with a pretrained model from DiffWave reproducing repo. This is a conditional model. So the generation is conditional.
Experiment: Having the pretrained model we generate: (1) English speech from
LJ dataset
; (2) Danish speech fromnst_data
. This is our baseline.We evaluate the SNR and STOI metrics.
Then we want to see how fine-tuning influences the performance.
Thus, we run generation for each dataset for each epoch of finetuning. Ie. generating
2 x n_epochs
times.We evaluate the SNR and STOI metrics for each of those as well.
2. Sample Rate sensitivity
Setup:: Same as in [1].
Experiment: Having the pretrained model we generate English speech from
LJ dataset
This is our baseline. We evaluate the SNR and STOI metrics.Then we want to see how sampling rate influences the performance.
So, we run preprocessing again for each of the sampling rates:
[16,20,25,44.1]
Then we generate English speech from
LJ dataset
for each of the sampling ratesWe evaluate the SNR and STOI metrics for each of those as well.
Tasks and responsibilities:
[0,1]
are exactly the same as in original annotated.csv. The way we can do that is to maintain the same splits from ONEannotation.csv
the only thing that should change is the root folder. Ie.data_path/sr_22.5/normal_structure
and thendata_path/sr_20/normal_structure
, etc. @sandorfoldimain
. Furthermore, a notebook with plots that can be rerun. @panosapos