question about the metadata

holehole5566 commented 7 months ago

Hi there! Firstly, thank you for providing the code for training AudioLDM.

I have a question regarding the AudioCaps dataset. In the *_label.json files, each data entry contains a "seg_label" key. The README mentions that pre-segmentation of audio files isn't necessary, but I'm curious about the purpose of this "seg_label" key.

Could you clarify whether the "seg_label" field is simply a path for saving preprocessed .npy files during training, or does it contain preprocessed data that requires specific steps before use? If the latter is true, could you guide me on how to process the WAV files into .npy format?

Thank you very much for your help!

haoheliu commented 7 months ago

@holehole5566 Sorry for the confusion and thank you for bring that up. The "seg_label" key is not used in the code so please ignore that key. I'll update the dataset tar file to remove this key in the future.

holehole5566 commented 7 months ago

OK! thanks for your replying

haoheliu / AudioLDM-training-finetuning

question about the metadata #18