lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.32k stars 249 forks source link

Question: Any way to specify validation dataset for SemanticTransformer, CoarseTransformer and FineTransformer? #242

Closed rgxb2807 closed 8 months ago

rgxb2807 commented 8 months ago

Forgive me if I'm missing something super obvious. I'm using the full LibriSpeech Coprus - they split the data into training and validation sets. I've combined all of the data into train and test data directories. Is there a way to specify how the data split occurs or preprocess the dataset such that a training and validation set are specified? It appears that only a random split is possible with SemanticTransformer, CoarseTransformer and FineTransformer.

When training SoundStream, you can specify training and validation sets by passing training and test dataloaders via the get_dataloader function (and first instantiating training and validation SoundDataset instances).

SemanticTransformer, CoarseTransformer and FineTransformer don't allow separate dataloaders to be passed.

For example in CoarseTransformer:

https://github.com/lucidrains/audiolm-pytorch/blob/1b4d80f93a2cc0c9dd4797959afa613aac9d029b/audiolm_pytorch/trainer.py#L933C1-L953C1

And here's the example from SoundStream https://github.com/lucidrains/audiolm-pytorch/blob/1b4d80f93a2cc0c9dd4797959afa613aac9d029b/audiolm_pytorch/trainer.py#L237C1-L244C42

Happy to raise a PR that copies similar logic as SoundStreamTrainer for the other 3 trainer classes

lucidrains commented 8 months ago

@rgxb2807 added it quickly for you this morning - happen to be working on video trainer code

are you using the latest version with residual LFQ btw? i'm curious how well that is working!

rgxb2807 commented 8 months ago

You're incredible, thank you!

I just kicked off training locally of the course transformer using meta encodec. So far it's working well.