habla-liaa / encodecmae

Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
69 stars 4 forks source link

Can't find run_pretraining.sh #2

Open DEDSEC-Roger opened 2 weeks ago

DEDSEC-Roger commented 2 weeks ago

Hi! I can't find the "scripts" folder, hence I can't find "run_pretraining.sh" file. How should I pretrain the model?

mrpep commented 1 week ago

Hi! The scripts are available for the previous version of the paper: https://github.com/habla-liaa/encodecmae/tree/v.1.0.0 Still working in updating training scripts for new version (models going from melspectrogram to encodec).

DEDSEC-Roger commented 1 week ago

Is there any big changes between the v1 and v2 version pretraining scripts? I currently change the "configs/features/wav_only.gin" to "configs/features/mel.gin" in the run_pretraining.sh, is there any other changes I should do?

mrpep commented 1 week ago

That should do it! By the way, checkpoints are available to download for all the paper models. For example, calling load_model('mel256-ec-base'). All available models are here: https://huggingface.co/lpepino/encodecmae-v2/tree/main

DEDSEC-Roger commented 1 week ago

In this model "mel256-ec-base-fma.pt", did you just use this data "fma_large.zip: 106,574 tracks of 30s, 161 unbalanced genres (93 GiB)" to train the model?

DEDSEC-Roger commented 1 week ago

I am trying my best to reproduce the "mel256-ec-base-fma.pt" model, I just replace "configs/features/wav_only.gin" with "configs/features/mel.gin" and have trained the model for 64.7k step, but the performance is still inferior to your release model, I wonder what have I missed, should I remove the "QUANTIZER_WEIGHTS"? Because I did not find it in "mel256-ec-base-fma.pt"'s config_str.

mrpep commented 6 days ago

Yes, it's only with FMA-large. Don't remove quantizer weights. And the problem I think is that the model is trained for 500k steps, it should make a difference training longer in downstream performance