Closed GUUser91 closed 12 months ago
max_len = 14
means you are only training with 14 * 300 / 24000 = 0.175
second of audio, which is not feasible at all. You will need at least max_len = 80
, which is one second of clip, to work. Try increase max_len
to at least 80 and decrease the batch size instead, as long as your batch size is greater than 1 you should be fine.
I get this error message when I try to finetune. I set batch_size to 12 and max_len to 14. I'm using torch-2.1.1 torchaudio-2.1.1 torchvision-0.16.1 if that matters.