mustass / diffusion_models_for_speech

Deep Learning course project repository.
https://kurser.dtu.dk/course/02456
1 stars 0 forks source link

remove duplicates from csv #32

Closed sandorfoldi closed 1 year ago

sandorfoldi commented 1 year ago

Feel free to play around with this and the previous version in pandas and see if I have missed something. Unfortunately there were some intersections between the train and inference split, but not a lot, < 5% if I'm not mistaken. In these cases, I removed them from the inference split, so testing should be safe now.

panosapos commented 1 year ago

There is still an issue here I believe. The new .csv file has ~20k duplicate paths and ~124k duplicate filenames