MusicCap Dataset Testing: Overfitting Issues in Model Predictions

oyzh888 commented 10 months ago

Hello,

I appreciate your excellent work and have a question regarding the testing process, specifically on how to ensure proper testing without falling into the trap of overfitting.

We conducted a test using the MusicCap dataset (https://huggingface.co/datasets/google/MusicCaps), which contains approximately 5.52K samples, somewhat akin to the 60K mentioned in your paper.

However, we encountered an issue where some samples appear to be "overfitting". Is this a normal occurrence? For instance, we observed cases where the model's prediction exactly matches the label in the MusicCap dataset.

One example involves the YouTube video with the ID -FFx68qSAuY (https://www.youtube.com/watch?v=-FFx68qSAuY). Audio file(you should uncompress it): -FFx68qSAuY.wav.zip

The model predicted:

{
  'text': 'This is a punk rock music piece. There are male vocals singing in a grunt-like manner. The melody is being played by an electric guitar while a bass guitar plays in the background. The rhythm consists of a slightly fast-paced rock acoustic drum beat. The piece has an aggressive atmosphere. It could be used in the soundtrack of an action-filled video game.',
  'time': '0:00-10:00'
}

For our tests, we used the following code: https://github.com/seungheondoh/lp-music-caps/blob/main/lpmc/music_captioning/captioning.py#L52, and executed the command:

python3 captioning.py --audio_path ../music_cap/lp-music-caps/lpmc/music_captioning/workspace/audio/-FFx68qSAuY.wav

Additionally, we noticed similar overfitting issues in approximately 10% of the samples, including these YouTube links: https://www.youtube.com/watch?v=PpJKo-JPVU0 https://www.youtube.com/watch?v=p0oRrGDrQQw Could you provide insights or guidance on this matter?

Thank you.

diggerdu commented 9 months ago

@seungheondoh Hello, I really appreciate your excellent work. But unknown train-test splits make it hard to follow up.

seungheondoh commented 7 months ago

@oyzh888 Sorry for the late reply. My model was trained with MusicCaps Training Split. Therefore, if you infer with MusicCaps, you may get similar results. The uploaded data is MusicCaps Test Split?

seungheondoh commented 7 months ago

@diggerdu please check https://github.com/seungheondoh/lp-music-caps/issues/4

seungheondoh / lp-music-caps

MusicCap Dataset Testing: Overfitting Issues in Model Predictions #9