The question about training CLAP RVQ

zhvng / open-musiclm

Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.

https://arxiv.org/abs/2301.11325

MIT License

522 stars 59 forks source link

The question about training CLAP RVQ #27

Open yangdongchao opened 1 year ago

yangdongchao commented 1 year ago

It seems that in the ClapRVQTrainer code, you donot use any gradient backward? How to understood this?

zhvng commented 1 year ago

Hey, thanks for your interest in this repo! The RVQ parameters are updated via ema. See this relevant thread

yangdongchao commented 1 year ago

Hey, thanks for your interest in this repo! The RVQ parameters are updated via ema. See this relevant thread

thanks for your reply. I want to ask how to judge the training is successful? When we can stop the training?

zhvng commented 1 year ago

I stopped when the eval loss stopped going down. seems to work good enough.

yangdongchao commented 1 year ago

I stopped when the eval loss stopped going down. seems to work good enough.

Thanks for your relpy. Lastly ,I want to ask, whether you only use audio to extract embedding and use RVQ to quantize it in the training proces. The text is only used in the inference stage?