kkoutini / PaSST

Efficient Training of Audio Transformers with Patchout
Apache License 2.0
287 stars 48 forks source link

Changing tdim for pretrained model #10

Closed ranjith1604 closed 2 years ago

ranjith1604 commented 2 years ago

Thanks for sharing such great work! I want to use the pre-trained model but changing input_tdim is giving an error. My audio clips are relatively small and hence i need a smaller input_tdim. How do I do that? The error I get is due to the pretrained layer's size not equal to the current size of the model(After using input_tdim)

kkoutini commented 2 years ago

Hi, can you give more details about the error? In general, if the audio is shorter than expected, you can leave input_tdim and it should be automatically handled.

ranjith1604 commented 2 years ago

The error goes like follows -

RuntimeError: Error(s) in loading state_dict for PaSST: size mismatch for time_new_pos_embed: copying a param with shape torch.Size([1, 768, 1, 99]) from checkpoint, the shape in current model is torch.Size([1, 768, 1, 7]).

This is corresponding to the following code - model = passt.get_model(arch="passt_s_swa_p16_128_ap476", pretrained=True, n_classes=2, in_channels=1, fstride=10, tstride=10,input_fdim=128, input_tdim=78, u_patchout=0, s_patchout_t=40, s_patchout_f=4)

kkoutini commented 2 years ago

Hi, you can keep the input_tdim to its default value. the model should handle shorter audio clips.