Closed nicolaus625 closed 11 months ago
the dataset audio.zip file on the your huggingface page does not include all the audio you need in the finetune.json, and evaluation.json file. Is that correct?
The audios.zip
file does contain all the music files required by PretrainMusicQA.json
, FinetuneMusicQA.json
and EvalMusicQA.json
files. I have also added a python script check.py
, here, to ensure all audio files are present.
Besides, in your paper, you mentioned you use MPT-7B dataset, but your json include the QA between "human" and "gpt". Did you use GPT-3.5, GPT-4 or MPT-7B?
We followed the same structure as LLaMA Adapter to load the dataset and hence the json containing the keys "human" and "gpt". The QA pairs were generated using MPT-7B and the generated QA pairs were organized into the JSON file following LLaMA Adapter's structure with "human" and "gpt" as keys to refer to the question and answer respectively.
I hope this answers your questions 😄
In the check.py, your code pretrain = json.load(open("PretrainMusicQA.json")) Â
But actually, you have multiple QA pairs for the same audio_name. So it might be better to use set([fow["audio_name"] for row in pretrain])
to replace pretrain
after that.
The old auido.zip file only contains ~10k audios file but the updated version is fine. Thank you very much
the dataset
audio.zip
file on the your huggingface page does not include all the audio you need in thefinetune.json
, andevaluation.json
file. Is that correct?Besides, in your paper, you mentioned you use MPT-7B dataset, but your json include the QA between "human" and "gpt". Did you use GPT-3.5, GPT-4 or MPT-7B?