Closed shamoons closed 3 years ago
Hi, I can't say fore sure what is causing this issue, but you could take a look at the attention maps, and the latent vector of the Transformer to see if anything looks suspicious.
This may be unrelated, but I don't understand why you are defining an audio encoder and decoder, in addition of the Transformer which itself is an encoder-decoder architecture. Why not simply use the Transformer as your AudioRecontructor ?
Thanks for the input. I think I saw somewhere in the docs about the attention map. The purpose of the encoder / decoder linear layers is to learn a vectorized representation of a raw audio signal (1-D) and then to learn from a vector output back to raw audio. I suppose I could just use the raw audio (that’s been unfolded) straight to the Transformer as well?
Hi, I can't say fore sure what is causing this issue, but you could take a look at the attention maps, and the latent vector of the Transformer to see if anything looks suspicious.
This may be unrelated, but I don't understand why you are defining an audio encoder and decoder, in addition of the Transformer which itself is an encoder-decoder architecture. Why not simply use the Transformer as your AudioRecontructor ?
That's what my attention map of the first layer looks like. Seems to be not really learning much of anything.
This is, however, without the linear layers before and after. I'll try adding them back and seeing if anything changes
Hi, I think I met the same problem as you. I tried to not use the Residual block after every sublayer, then the out is the same for all timesteps. I have viewed your code, but I also find the residual block, maybe you can try it. anyway, I also why the output is same for all steps if I don't use the Residual block.
Hi, I think I met the same problem as you. I tried to not use the Residual block after every sublayer, then the out is the same for all timesteps. I have viewed your code, but I also find the residual block, maybe you can try it. anyway, I also why the output is same for all steps if I don't use the Residual block.
I'm not sure I follow - we don't have options to set residual blocks, do we?
Residual blocks are currently hard coded, but should be easy to remove, although I'm not sure why you would want to. Your attention map has a weird look to it, but at least not all values are equals.
My model is:
If I print the values after
self.transformer(out)
, I get the same values for each time step. Any ideas why that might be?