The original is not the same length as the soundstream output

lzl1456 commented 1 year ago

input shape = torch.Size([1, 225360])

output shape = torch.Size([1, 1, 225280])

lucidrains commented 1 year ago

@lzl1456 the input needs to have a length that is divisible by the cumulative product of the strides

i curtail it otherwise for the reconstruction loss https://github.com/lucidrains/audiolm-pytorch/blob/main/audiolm_pytorch/soundstream.py#L587

lzl1456 commented 1 year ago

thanks，about soundstream i use libi-light training, 50k steps ，data_max_length_seconds = 10s soundstream = SoundStream( codebook_size = 1024, target_sample_hz = 16000, rq_num_quantizers = 12, attn_window_size = 128, # local attention receptive field at bottleneck attn_depth = 2 # 2 local attention transformer blocks - the soundstream folks were not experts with attention, so i took the liberty to add some. encodec went with lstms, but attention should be better ).cuda()

Do you have a better training situation? At present, I train the model to compress and encode and restore it directly. Compared with the original audio, the loss is relatively large. Background noise (sounds like machinery) mixed in

lucidrains commented 1 year ago

@lzl1456 feel free to chat with other practitioners in the discussion boards

lucidrains / audiolm-pytorch

The original is not the same length as the soundstream output #115