mustass commented 1 year ago

Experiments

We have two tracks:

1. Language sensitivity [DA-ENG]

Setup: Here we begin with a pretrained model from DiffWave reproducing repo. This is a conditional model. So the generation is conditional.

Experiment: Having the pretrained model we generate: (1) English speech from LJ dataset; (2) Danish speech from nst_data. This is our baseline.

We evaluate the SNR and STOI metrics.

Then we want to see how fine-tuning influences the performance.

Thus, we run generation for each dataset for each epoch of finetuning. Ie. generating 2 x n_epochs times.

We evaluate the SNR and STOI metrics for each of those as well.

2. Sample Rate sensitivity

Setup:: Same as in [1].

Experiment: Having the pretrained model we generate English speech from LJ dataset This is our baseline. We evaluate the SNR and STOI metrics.

Then we want to see how sampling rate influences the performance.

So, we run preprocessing again for each of the sampling rates:

[16,20,25,44.1]

Then we generate English speech from LJ dataset for each of the sampling rates

We evaluate the SNR and STOI metrics for each of those as well.

Tasks and responsibilities:

[ ] Run preprocessing for each different Sample Rate. Make sure that the splits [0,1] are exactly the same as in original annotated.csv. The way we can do that is to maintain the same splits from ONE annotation.csv the only thing that should change is the root folder. Ie. data_path/sr_22.5/normal_structure and then data_path/sr_20/normal_structure, etc. @sandorfoldi
[ ] Fine-tune on Danish dataset for 6 epochs in total. @mustass
[ ] Train the Danish model from scratch and see how it compares to fine-tuned or pretrained both for danish and english. Basically Experiment 1 with this model. @mustass
[ ] Evaluation should work and be merged into main. Furthermore, a notebook with plots that can be rerun. @panosapos

mustass commented 1 year ago

sandorfoldi commented 1 year ago

About sample rate sensitivity:

Do we still want to do this for both datasets?
With which models do we want to test this?, only the pretrained or with some finetuned models too?

mustass commented 1 year ago

About sample rate sensitivity:

Do we still want to do this for both datasets?

Nope, just for english for now.

With which models do we want to test this?, only the pretrained or with some finetuned models too?

We don't finetune on the different SR for now. Just the pretrained.

mustass commented 1 year ago

Can we find another english speech dataset? not LJ? @sandorfoldi @panosapos

mustass / diffusion_models_for_speech

Experiments & Tasks #35

Experiments

1. Language sensitivity [DA-ENG]

2. Sample Rate sensitivity

Tasks and responsibilities:

About sample rate sensitivity:

About sample rate sensitivity: