mustass / diffusion_models_for_speech

Deep Learning course project repository.
https://kurser.dtu.dk/course/02456
1 stars 0 forks source link

Improved preprocessing and collator #13

Closed sandorfoldi closed 1 year ago

sandorfoldi commented 1 year ago

During preprocessing, spectrograms are now saved as torch tensors The lengths of audiofiles are also saved to a csv file

The following two collators are implemented and can be selected from cfg:

The audio_lengths csv file is used when collator == DeleteShorts, and this way only the long enough audiofile paths are stored in the dataset => batch size is constant

sandorfoldi commented 1 year ago

I tested this on my local machine, voltash and it throws no errors. Tried to submit it as a job too, so far I got no notifications

mustass commented 1 year ago

@panosapos how will this extraction of audio lengths work when we have a mixture of datasets? Have you discussed that?

panosapos commented 1 year ago

@panosapos how will this extraction of audio lengths work when we have a mixture of datasets? Have you discussed that?

You think that the solution you suggested yesterday would not work in this case?

mustass commented 1 year ago

@panosapos how will this extraction of audio lengths work when we have a mixture of datasets? Have you discussed that?

You think that the solution you suggested yesterday would not work in this case?

No no, it will. I just want to be reassured that you guys talk to each other :)