keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.45k stars 84 forks source link

What should i do when input shape is a rectangle ? #32

Closed syjabc closed 1 year ago

syjabc commented 1 year ago

Thank you for your excellent work. I want to do some work on audio classification task, and I found it is impossible to keep input shape as a square. What should i do when input tensor is a rectangle,like (128, 600).

keyu-tian commented 1 year ago

Well though the current codes are not suitable for rectangle input, i think you only need to change a few things to make it work.

First you can refer to /pretrain/README.md for customizing your audio dataset (and the corresponding data preprocessing code) and your CNN model.

Then you should make fmap_h and fmap_w different (in this line), to represent a rectangle shape, and change everything related to it, such as making input_size in /pretrain/encoder.py and input_size in /pretrain/utils/imagenet.py a tuple.

Then i think things will work.

syjabc commented 1 year ago

Thanks for your thorough and patient responses.