question about shared filters

thjashin / multires-conv

Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)

MIT License

119 stars 2 forks source link

question about shared filters #1

Closed kashif closed 1 year ago

kashif commented 1 year ago

Shared filters across timescales don’t necessarily sound like an advantage, any intuition on why that is better? do you have an ablation on this? Thanks for any insights!

thjashin commented 1 year ago

Hi @kashif , I personally think it is an elegant design since it is directly guided by the wavelet-based multiresolution analysis theory. We get parameter efficiency for free. On the other hand, I don't think it should affect the performance much as I had some preliminary experiments before which relax the filters (I will paste it here later).

thjashin commented 1 year ago

@kashif FYI i tested the untied version of the MultiresLayer on the sequential cifar and long ListOps experiment. The results are (every other setting is kept the same as in the paper except using different filters for different timescales)

	untied	tied (same as in the paper)
scifar	92.16%	93.15%
long listops	61.85%	62.75%