Questions About Data Augmentation Methods

Hello!

First of all, thanks for sharing your awesome research. I’ve been following your paper and trying to replicate the experiments, but I keep running into overfitting issues. I saw that you mentioned using data augmentation techniques like noise perturbation, time shifting, and speed perturbation in your paper. I think these could help me, but I wanted to clarify a few things:

Were the augmentation parameters fixed, or did you add randomness for diversity (e.g., varying noise levels or shift ranges)? Did you use a specific library like audiomentations for this, or did you implement these augmentations yourself?

I’m using the CREMA-D dataset just like you, and I’ve kept most settings the same as described in the paper. But the overfitting issue persists, and I think the way the augmentations were applied might make a big difference.

If you could share any tips or details about how you used these augmentations, it would be a huge help!

Thanks in advance, and I appreciate the work you’ve done :)

kjy7567 / speech_emotion_recognition_from_log_Mel_spectrogram_using_vertically_long_patch

Questions About Data Augmentation Methods #6