jongwook / onsets-and-frames

A Pytorch implementation of Onsets and Frames (Hawthorne 2018)
MIT License
209 stars 66 forks source link

Checkpoint and possible fine-tuning on a custom dataset #23

Open adrienchaton opened 4 years ago

adrienchaton commented 4 years ago

Hello Jong Wook,

I would like to experiment fine-tuning Onsets and Frames on a custom dataset with your PyTorch implementation.

For that I would ask, is there a pretrained model checkpoint available for your implementation please ?

Then I would format the custom dataset I would like to fine-tune on as the MAPS example: one folder of .flac audio inputs at 16kHz mono one folder of matched .tsv annotation targets (col 1. onset sec. / col 2. offset sec. / col 3. note / col 4. velocity) So that it can be read with PianoRollAudioDataset and used for continuing training a previous checkpoint.

One more thing I would please ask for confirmation regarding the annotation files, the 3rd. note column should be midi pitch (in the range of the 88 piano keys) and the 4th. velocity column should be scaled in which range ? (it doesn't seem to go up to 127 like a midi velocity)

Thanks for sharing the model to Pytorch !

jongwook commented 4 years ago

Hello, try this checkpoint which was trained for 500,000 iterations on the MAESTRO dataset.

I haven't tried fine-tuning on these models, but it should be theoretically possible. I expect that it'd require a careful hyperparameter search (epochs, learning rate and betas of Adam, etc.)

Regarding velocity, I believe it's the same as the MIDI scale; just that most MIDI files in the dataset don't have too high velocity. AFAIK the original paper tried some heuristic for normalizing the velocity but I didn't reimplement that. You can check out MIDI preprocessing code here.

adrienchaton commented 4 years ago

Thank you very much. Checkpoint is helpful as I want to compare retraining from scratch or fine-tuning from the maestro checkpoint ! And indeed, your midi code seems to follow the standard MIDI range. Your data loader is nicely done too, I could easily adapt it to another dataset based on the MAPS class.

zappos23 commented 3 years ago

Hello, try this checkpoint which was trained for 500,000 iterations on the MAESTRO dataset.

I haven't tried fine-tuning on these models, but it should be theoretically possible. I expect that it'd require a careful hyperparameter search (epochs, learning rate and betas of Adam, etc.)

Regarding velocity, I believe it's the same as the MIDI scale; just that most MIDI files in the dataset don't have too high velocity. AFAIK the original paper tried some heuristic for normalizing the velocity but I didn't reimplement that. You can check out MIDI preprocessing code here.

Hi Jongwook, May i know to use this checkpoint, is there any requirement on the torch version? I ran into issue when trying it out. torch.nn.modules.module.ModuleAttributeError: 'LSTM' object has no attribute '_flat_weights'

My torch version is 1.7.1

Thanks

adrienchaton commented 3 years ago

I am running the codes with torch==1.3.0 There would be some (minor) things to update for running on higher versions.

zappos23 commented 3 years ago

I am running the codes with torch==1.3.0 There would be some (minor) things to update for running on higher versions.

Thanks. I think torch==1.3.0 is not available in python3.7. I got much more error with python3.6 with torch==1.3.0 May i know what is the minor things that needs to update for running on higher version?

adrienchaton commented 3 years ago

You'd have to update error by error, but it should not be a lot since Pytorch has not changed much in the basic functions that are used here. One more complicate issue you may have is to load the pretrained model weights in the updated class ..

I could install both 1.2.0 and 1.3.0 for python 3.7.9 ; I think pip still let installing older versions or you'd have to compile it yourself.

zappos23 commented 3 years ago

You'd have to update error by error, but it should not be a lot since Pytorch has not changed much in the basic functions that are used here. One more complicate issue you may have is to load the pretrained model weights in the updated class ..

I could install both 1.2.0 and 1.3.0 for python 3.7.9 ; I think pip still let installing older versions or you'd have to compile it yourself.

noted. Thanks for the info!

zappos23 commented 3 years ago

have you come across this error while loading the MAPS dataset during evaluation?

RuntimeError: data/MAPS/flac/MAPS_MUS-bk_xmas1_ENSTDkAm.pt is a zip archive (did you mean to use torch.jit.load()?)

adrienchaton commented 3 years ago

no, I did not use either MAESTRO or MAPS

I wanted to fine-tune on a custom dataset so I made a class similar to MAPS(PianoRollAudioDataset) which reads flac audio and tsv annotations put in the same format as the MAPS dataset.

tsv annotations are in the column format 'onset,offset,note,velocity' so it's easy to convert any dataset you'd get to train with that dataset class

if you cannot load the prepared MAPS data (which I did not use), you could download it from elsewhere, format it as flac/tsv and load it with the dataset class

zappos23 commented 3 years ago

Thank @adrienchaton .

Ashwin-Ramesh2607 commented 3 years ago

@jongwook Could you provide the checkpoint after 1 million steps as well?

xk-wang commented 3 years ago

the metrics of this model is not very good.

                        note precision                : 0.809 ± 0.111
                        note recall                   : 0.760 ± 0.110
                        note f1                       : 0.782 ± 0.106
                        note overlap                  : 0.554 ± 0.105
                        note-with-offsets precision                : 0.378 ± 0.135
                        note-with-offsets recall                   : 0.356 ± 0.133
                        note-with-offsets f1                       : 0.366 ± 0.133
                        note-with-offsets overlap                  : 0.817 ± 0.084
                        note-with-velocity precision                : 0.738 ± 0.112
                        note-with-velocity recall                   : 0.694 ± 0.113
                        note-with-velocity f1                       : 0.714 ± 0.109
                        note-with-velocity overlap                  : 0.557 ± 0.107
                        note-with-offsets-and-velocity precision                : 0.350 ± 0.130
                        note-with-offsets-and-velocity recall                   : 0.329 ± 0.129
                        note-with-offsets-and-velocity f1                       : 0.339 ± 0.129
                        note-with-offsets-and-velocity overlap                  : 0.816 ± 0.084
                       frame f1                       : 0.651 ± 0.112
                       frame precision                : 0.639 ± 0.166
                       frame recall                   : 0.694 ± 0.087
                       frame accuracy                 : 0.492 ± 0.121
                       frame substitution_error       : 0.102 ± 0.052
                       frame miss_error               : 0.204 ± 0.085
                       frame false_alarm_error        : 0.393 ± 0.386
                       frame total_error              : 0.698 ± 0.386
                       frame chroma_precision         : 0.673 ± 0.162
                       frame chroma_recall            : 0.735 ± 0.083
                       frame chroma_accuracy          : 0.532 ± 0.112
                       frame chroma_substitution_error: 0.061 ± 0.029
                       frame chroma_miss_error        : 0.204 ± 0.085
                       frame chroma_false_alarm_error : 0.393 ± 0.386