karchkha / MelSpec_VQVAE

VQVAE compression for MelSpectrograms
8 stars 2 forks source link

Pretraining model? #1

Open a897456 opened 4 months ago

a897456 commented 4 months ago

Do you have a pre-training model? I want to save time on training. And what about your training hours with the epoch=100.

karchkha commented 4 months ago

Hi, thank you for your interest.

Unfortunately, I don't have a pre-trained model as I've stopped working on this project and shifted focus to diffusion models recently. However, I recommend checking out the original repository where they provide code along with pre-trained checkpoints: https://github.com/v-iashin/SpecVQGAN

It should help save you time on training.

a897456 commented 4 months ago

Hi, thank you for your interest.

Unfortunately, I don't have a pre-trained model as I've stopped working on this project and shifted focus to diffusion models recently. However, I recommend checking out the original repository where they provide code along with pre-trained checkpoints: https://github.com/v-iashin/SpecVQGAN

I completed the pre-training of epoch=60 and got the".ckpt" file. I want to test the effect. What I'm going to do is that:

  1. load a ".npy" file which contains the mel spectrum into the pre-trained model which is the ".ckpt" file I think,
  2. then output file, a new ".npy" file which passed the VQVAE operation of the model, is generated
  3. the new ".npy" file will be set into the vocoder, MelGAN or something, a ".wav" file will be generated

In the first step, I don't know how to load ".npy" file into ".ckpt" file, I just use Trainer.validate(), but I don't see any file output. Can you help me? THS

a897456 commented 4 months ago

image This is the result obtained by using trainer.validate(). I don't know what these data mean to me. I just want a ".npy" file, but I don't have it. What should I do to get this file?

karchkha commented 4 months ago

Okay, these numbers indicate that the codebook is not being used properly. You can see that most of the codebook items are used 0 times. I can say for sure that 60 epochs are not enough.

karchkha commented 4 months ago

In the first step, I don't know how to load ".npy" file into ".ckpt" file, I just use Trainer.validate(), but I don't see any file output. Can you help me? THS

Trainer.validate() should do all those automatically. it will load chkpt and process melspectrograms that are given as .npy files.

a897456 commented 4 months ago
  1. then output file, a new ".npy" file which passed the VQVAE operation of the model, is generated

Trainer.validate() should do all those automatically. it will load chkpt and process melspectrograms that are given as .npy files.

Thank you for your reply. My expression is not accurate enough and my mistake. My question is: why does the function Trainer.validate() not generate melspectrograms processed by VQVAE and output them as.npy files? I don't find any newly generated .npy files, except for the image below. image image

karchkha commented 4 months ago

It definitely should! This was more than a year ago the last time I worked with this code, but VQVAE was outputting mel-specs. Once again, this repo is abandoned and probably full of bugs.

Once again, i would suggest you to go ahead and use the original one: https://github.com/v-iashin/SpecVQGAN. They have pretrained models and everything intact.

a897456 commented 4 months ago

It definitely should! This was more than a year ago the last time I worked with this code, but VQVAE was outputting mel-specs. Once again, this repo is abandoned and probably full of bugs.

The program did have several bugs, and it took me a day to get through it, judging by the time I asked the first question and the second question. But I think the program itself is a very good idea that's what I've been thinking lately. Thanks for all the work you have did.

Once again, i would suggest you to go ahead and use the original one: https://github.com/v-iashin/SpecVQGAN. They have pretrained models and everything intact.

The first thing I searched was your program, so I started working on it. I paid attention to the original one, https://github.com/v-iashin/SpecVQGAN, and it's complicated, not as clean as your program, but I'll get over it.

karchkha commented 4 months ago

Yeah, that is exactly what I did. I kind of simplified the original one. I removed the video part because I was trying a different approach. I wanted to find a VAE model that would process VQVAE outputs via transformers and find embeddings for general audio. However, this didn't really work, so I switched to other research.

Maybe I gave up too early :)) i don't know!

anywys whatns for your interest!

a897456 commented 3 months ago

image Sorry to bother you. I changed the dataset and made more epochs, then I wanted to see the training effect, but I don't know why the loading checkpoint error occurred, can you point out where the error is?

karchkha commented 3 months ago

Seems like checkpoint and initilised model have differnt dimentions. are you sure you trained exacly the same size model?

a897456 commented 3 months ago

image I may have changed the parameters marked in the figure, but I can't remember clearly. The model parameters were adjusted when I asked you the question the second time. Yesterday, I only increased the epoch and replaced a new dataset, so I can't remember clearly. What does this parameter do?

karchkha commented 3 months ago

Yes, it seems that's a problem. I don't remember exactly, but I believe that parameter defines the shape of the discriminator. We use the discriminator in conjunction with the VAE.

a897456 commented 3 months ago

https://github.com/karchkha/MelSpec_VQVAE/blob/1622649eb4e2abb62e2f104ab13e9a68f21d830b/datasets/vas.py#L53 Hi @karchkha How to set the spec_crop_len for this place? Because the input sample's spec_len is cluttered, there are many samples whose mel_length is less than the spec_crop_len, then the program will report an error. So how do you solve this problem? PS: The author of the original code does not seem to give a solution.

a897456 commented 3 months ago

Seems like checkpoint and initilised model have differnt dimentions. are you sure you trained exacly the same size model?

I found the problem. This parameter should be 3 https://github.com/karchkha/MelSpec_VQVAE/blob/1622649eb4e2abb62e2f104ab13e9a68f21d830b/train.py#L56 https://github.com/karchkha/MelSpec_VQVAE/blob/1622649eb4e2abb62e2f104ab13e9a68f21d830b/train.py#L113 https://github.com/karchkha/MelSpec_VQVAE/blob/1622649eb4e2abb62e2f104ab13e9a68f21d830b/models/big_model_attn_gan.py#L539