thjashin / multires-conv

Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
MIT License
120 stars 2 forks source link

Wavenet #2

Open pranavmalikk opened 1 year ago

pranavmalikk commented 1 year ago

It was mentioned in the paper "Our model resembles WaveNet (Oord et al., 2016a) in the use of tree-structured dilated convolutions. However, our principle-guided design has distinct skip-connection structures and filter sharing patterns, resulting in significantly better parameter efficiency and performance...Additionally, the link we establish between wavelets and tree-structured dilated causal convolutions offers the first principled justification for the effectiveness of WaveNet in modeling raw audio waveforms, an exemplary case of lengthy sequences with multiscale structure."

Do you have any ablations on the difference in performance in any specific tasks or tests? Also any specific audio samples? Overall very interesting paper!

thjashin commented 1 year ago

Hi @pranavmalikk,

We did not include WaveNet results in the paper because we find it very difficult to have a fair setup. There are so many choices we need to make before we setup such a comparison:

  1. First of all, WaveNet has no official open-source implementation and many details are unclear from the paper. Existing re-implementations all differ to some extent (in the way of initialization/bias/latent dimension choices), which one should we use?
  2. WaveNet is originally developed for generation. Shall we modify it with a mean-pooling and then compare on the classification tasks in our paper?
  3. WaveNet does not use mixing layers and normalization. Shall we keep those components and only replace MultiresLayer with a WaveNet block? Or do we want to compare the whole MultiresNet with a WaveNet? In each choice, how should we choose the hyper-parameters of WaveNet?

If you can be specific about these questions, I am happy to run an ablation and post the results here.