mustass / diffusion_models_for_speech

Deep Learning course project repository.
https://kurser.dtu.dk/course/02456
1 stars 0 forks source link

Experiments & Tasks #35

Open mustass opened 1 year ago

mustass commented 1 year ago

Experiments

We have two tracks:

1. Language sensitivity [DA-ENG]

Setup: Here we begin with a pretrained model from DiffWave reproducing repo. This is a conditional model. So the generation is conditional.

Experiment: Having the pretrained model we generate: (1) English speech from LJ dataset; (2) Danish speech from nst_data. This is our baseline.

We evaluate the SNR and STOI metrics.

Then we want to see how fine-tuning influences the performance.

Thus, we run generation for each dataset for each epoch of finetuning. Ie. generating 2 x n_epochs times.

We evaluate the SNR and STOI metrics for each of those as well.

2. Sample Rate sensitivity

Setup:: Same as in [1].

Experiment: Having the pretrained model we generate English speech from LJ dataset This is our baseline. We evaluate the SNR and STOI metrics.

Then we want to see how sampling rate influences the performance.

So, we run preprocessing again for each of the sampling rates:

[16,20,25,44.1]

Then we generate English speech from LJ dataset for each of the sampling rates

We evaluate the SNR and STOI metrics for each of those as well.

Tasks and responsibilities:

mustass commented 1 year ago

20221201_114937.jpg

sandorfoldi commented 1 year ago

About sample rate sensitivity:

mustass commented 1 year ago

About sample rate sensitivity:

  • Do we still want to do this for both datasets?

Nope, just for english for now.

  • With which models do we want to test this?, only the pretrained or with some finetuned models too?

We don't finetune on the different SR for now. Just the pretrained.

mustass commented 1 year ago

Can we find another english speech dataset? not LJ? @sandorfoldi @panosapos