A change of dataset is proposed over the FSDD database for the following reasons:
Increased Size: The FSDD dataset comprises 3000 audio samples over 10 classes while the FSDKaggle18 dataset comprises 11,073 audio samples over 41 classes. When creating a balanced dataset, we subset the FSDKaggle18 dataset such that it contains only classes with 300 samples per class which give us 5400 samples over 18 classes.
Increased Complexity: FSDD contains only spoken digit sounds and there is very little inherent complexity in the audio apart from changes due to different speakers. FSDKaggle18 dataset contains audio files from a variety of sources thus creating a more diverse and general-purpose audio dataset for benchmarking.
A change of dataset is proposed over the FSDD database for the following reasons: