lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.33k stars 249 forks source link

Saves corrupt audio #161

Open sulkytejas opened 1 year ago

sulkytejas commented 1 year ago

After training the dataset with 2-sec audio, it generates an int64 dtypetensor.Torchaudio.save` does not support the type. So I have to cast it to int32. After saving that file, I get an empty file.

Can you guide me on the correct way to save it? Screenshot 2023-04-02 at 5 41 13 PM

Also, could you let me know how many audio files I should use to train?

LWprogramming commented 1 year ago

Did you mean the sample rate to be 16000 not 1600? I'm not sure if that's behind the int64 thing though.

re: how many audio files I should use to train? the answer is probably a lot :)

sulkytejas commented 1 year ago

@LWprogramming Thank you for the reply. I am pretty new to the field and still unaware of the unknowns; here is my collab drive. Can you please help me with what I did do wrong? https://colab.research.google.com/drive/10uHyvlwbhrnA3puvznJ4rQo0UOIsRPSj?usp=sharing

Also, would you happen to know any open-source dataset I can use for training? I went through and extracted some myself but it was not enough.