GTZAN and CREMA-D - Githubissues

nttcslab / m2d

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

https://ieeexplore.ieee.org/document/10502167

Other

64 stars 2 forks source link

GTZAN and CREMA-D #2

Closed yunzqq closed 1 year ago

yunzqq commented 1 year ago

Hi, Could you please provide the split setting on these two datasets? Thank you!

daisukelab commented 1 year ago

Hi, thanks for your interest.

You can find them in the evaluation package EVAR. Please follow: https://github.com/nttcslab/eval-audio-repr/blob/main/Preparing-datasets.md

Data splits can be found in evar/metadata/*.csv

Hope it helps.

yunzqq commented 1 year ago

Hi, thanks for your interest.

You can find them in the evaluation package EVAR. Please follow: https://github.com/nttcslab/eval-audio-repr/blob/main/Preparing-datasets.md

Data splits can be found in evar/metadata/*.csv

Hope it helps.

Many thanks!

daisukelab commented 1 year ago

Please let us know if you publish your paper in the future!

yunzqq commented 1 year ago

Please let us know if you publish your paper in the future!

hhh, OK!

yunzqq commented 1 year ago

Please let us know if you publish your paper in the future!

May I ask another question? For the long audio recording, how long clips are made for training? and At inference time, the audio recording is split into clips and averaged the logit is used for final classification results?

Best, Qiquan

daisukelab commented 1 year ago

May I ask another question? For the long audio recording, how long clips are made for training? and At inference time, the audio recording is split into clips and averaged the logit is used for final classification results?

Thank you for your question! A quick answer is you would be correct.

While pre-training, we randomly crop a fixed duration from training samples.
At inference time, we do the following:

And the runtime implementation is as follows:

https://github.com/nttcslab/m2d/blob/4cdffb04b3fb311fc64a87021ccc3cacf7a01ceb/m2d/runtime_audio.py#L173-L225

Please let me know if you have any more questions.

yunzqq commented 1 year ago

Many thanks for your help！Are you in Greece for ICASSP 2023？If yes May I communicate with you ？ hahaha

daisukelab commented 1 year ago

Yes, I'm presenting this paper:

AASP-P4: Anomaly Detection and Representation Learning for Audio Classification Room: Poster Area 2 - Garden Type: Poster 03:35 PM to 5:05 PM 1773 (AASP-P4.3): MASKED MODELING DUO: LEARNING REPRESENTATIONS BY ENCOURAGING BOTH NETWORKS TO MODEL THE INPUT

See you there! :)

yunzqq commented 1 year ago

Yes, I'm presenting this paper:

AASP-P4: Anomaly Detection and Representation Learning for Audio Classification Room: Poster Area 2 - Garden Type: Poster 03:35 PM to 5:05 PM 1773 (AASP-P4.3): MASKED MODELING DUO: LEARNING REPRESENTATIONS BY ENCOURAGING BOTH NETWORKS TO MODEL THE INPUT

See you there! :)

Many thanks! See you there! Best,

Qiquan Zhang