some questions about model

hi, thank you for your great work, I have some questions:

As you mentioned in the paper: adding the attention module before the ConvNext module appears to be the optimal solution.，but I found that in decoder/models.py, the AttnBlock is contained in pos_net which is after convnext. They seem to be in the opposite order as in the paper.
I want to do streaming inference with wavtokenizer, I have replaced all the convolution layers in SEANetEncoder, SEANetDecoder, ConvNeXtBlock and pos_net(ResnetBlock+AttnBlock) with causal convolution layers(class SConv1d with causal=T). Unfortunately, the genetrator loss keeps increasing(see in the pic). Is there any wrong in the modified model?

thank you for your reply!

hi, thank you for your great work, I have some questions:

As you mentioned in the paper: adding the attention module before the ConvNext module appears to be the optimal solution.，but I found that in decoder/models.py, the AttnBlock is contained in pos_net which is after convnext. They seem to be in the opposite order as in the paper.

I want to do streaming inference with wavtokenizer, I have replaced all the convolution layers in SEANetEncoder, SEANetDecoder, ConvNeXtBlock and pos_net(ResnetBlock+AttnBlock) with causal convolution layers(class SConv1d with causal=T). Unfortunately, the genetrator loss keeps increasing(see in the pic). Is there any wrong in the modified model?

thank you for your reply!

The code is consistent with the paper, with the attention module placed before the ConvNeXt blocks. Link
We have also experimented with WavTokenizer-Streaming and found the performance to be satisfactory. The issue you are encountering appears to be a bug or some other underlying cause. During modification, only the parameters of the encoder need to be adjusted, while for the decoder, detailed changes to the attention and convolution modules are required.

jishengpeng / WavTokenizer

some questions about model #49