Closed syjunghwang closed 1 year ago
Hm, are you getting a particular error message for coarse transformer? As I understand it, by the time we give a codec to either coarse or fine transformer, the codec is already pretrained, so the logic with torch.no_grad()
is fine.
@LWprogramming One-dimensional wavform(ex.[(96000)]) are derived from the data.py, but if i put that input to the current encodec code, they don't run. So, in the front of the code, 'wav.unsqueeze (0)' must be inserted into the encodec code.
I added an assertion in my personal_hacks
branch but was still able to train coarse correctly. Here is the demo script I am using (I may insert assertion errors as reminders to myself but you can ignore those if you want to try using it to replicate), and here is the script to use specifically my personal_hacks
branch.
Is it raising an error when you train using the demo script? If so, it's possible that there's some other processing between when it comes from data.py and when it actually enters encodec, correctly shaping it (in which case I don't think we'd need unsqueeze)
Like the demo code you showed me, I checked that there is no error if I give the data_max_secondsparameter as 320*30 in the CoarseTransformerTrainer.
However, In the CoarseTransformerTrainer, if i give the parameter (data_max_length_seconds = 30), the following error comes.
AssertionError: Expected indices to have shape (batch, num_frames, num_coarse_quantizers + num_fine_quantizers), but got torch.Size([1, 2250, 8])
The coarset transformer is learned when using soundstream, but not when using encodec.