hi, I am trying to implement a streaming WavTokenizer. I set causal = True in encoder without other modification, and replace all nn.Conv1d in the decoder with SConv1d. For example, in WavTokenizer/decoder/modules.py, I changed self.dwconv = nn.Conv1d(dim, dim, kernel_size=7, padding=3, groups=dim) to self.dwconv = SConv1d(dim, dim, kernel_size=7, groups=dim, causal=True). In the AttenBlock, after multiplying q and k, I add a mask matrix as follows:
hi, I am trying to implement a streaming WavTokenizer. I set
causal = True
in encoder without other modification, and replace all nn.Conv1d in the decoder with SConv1d. For example, inWavTokenizer/decoder/modules.py
, I changedself.dwconv = nn.Conv1d(dim, dim, kernel_size=7, padding=3, groups=dim)
toself.dwconv = SConv1d(dim, dim, kernel_size=7, groups=dim, causal=True)
. In the AttenBlock, after multiplying q and k, I add a mask matrix as follows:Is my modification correct? Unfortunately, during the experiment, distortion appeared at the end of the audio.
thank you for your reply!