about preprocess my own pretrain data

wazenmai / MIDI-BERT

This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MIT License

181 stars 23 forks source link

about preprocess my own pretrain data #8

Open fourthbrother opened 2 years ago

fourthbrother commented 2 years ago

can you tell me where i am wrong and where do i need to change

sophia1488 commented 2 years ago

Hi @fourthbrother, Is this your own dataset? (i.e. not the 5 datasets provided in Readme) Given the limited info, I could only guess from the error message. Maybe items is empty? You could check by printing out note_items on Line 33 model.py.

In general, just check things before it breaks. Hope it helps!

fourthbrother commented 2 years ago

Hi @fourthbrother, Is this your own dataset? (i.e. not the 5 datasets provided in Readme) Given the limited info, I could only guess from the error message. Maybe items is empty? You could check by printing out note_items on Line 33 model.py.

In general, just check things before it breaks. Hope it helps!

yes, it's my own dataset which i download from https://github.com/bytedance/GiantMIDI-Piano,and then i use python3 main.py --input_dir=my_directory

sophia1488 commented 2 years ago

Hello, I downloaded surname_checked_midis_v1.2. I also encountered the same problem when preprocessing the dataset, and it really is because the note_items is empty. For example, Stahl, William C., Golden Bell Waltz, gmrKI53VUVQ.mid contains no data (I also listened to it to make sure.)

So I add some code to ensure the note_items is not empty. Please update the repo or modify model.py like the following.

# model.py
def extract_events(self, input_path, task):
    note_items, tempo_items = utils.read_items(input_path)
    if len(note_items) == 0:             # 1. add this
        return None
    ...

def prepare_data(self, midi_paths, task, max_len):
    ...
    for path in tqdm(midi_paths):
         events = self.extract_events(path, task)
             if not events:              # 2. add this
                 print(f'skip {path} because it is empty')
                 continue
        ...

fourthbrother commented 2 years ago

when I finutune model by using python3 finetune.py --task=melody --name=default, i have met the following problem, I guess the reason is that the version is not the same as yours，can you help me solve this problem

sophia1488 commented 2 years ago

Please refer to this issue #6. I think the reason is that I trained the model on multi-gpu (a single GPU cannot afford my training), while you two use a single GPU (I assume?).

Also some info I found online: https://discuss.pytorch.org/t/using-a-single-gpu-for-a-model-trained-with-multiple-gpus/1607/5

fourthbrother commented 2 years ago

I found that the parameter max_seq_len cannot be changed. when i set max_seq_len=256, the following error will appear

sophia1488 commented 2 years ago

Hi @fourthbrother, I'm sorry I missed this issue... Since our focus is CP (the performance is also better), I would suggest you use the CP-version. But I will try to find some time to look into it.