maxjcohen / transformer

Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series.
https://timeseriestransformer.readthedocs.io/en/latest/
GNU General Public License v3.0
842 stars 165 forks source link

x + residual size mismatch #15

Closed gccollect closed 4 years ago

gccollect commented 4 years ago

Hi, thanks for making this implementation available. I am following the tutorial but I am encountering a size mismatch error when I call net(inputs) on my timeseries data. My input is 1 x K x d_input but the output of the self-attention layer appears to be truncated to K-5 and thus cannot be added to the residual.

     86         x = self._selfAttention(query=x, key=x, value=x)
     87         x = self._dopout(x)
---> 88         x = self._layerNorm1(x + residual)
     89 
     90         # Feed forward

RuntimeError: The size of tensor a (3864) must match the size of tensor b (3869) at non-singleton dimension 1

I am using the same parameters as the training.ipynb

d_model = 100 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"
maxjcohen commented 4 years ago

Hi, this may be due to the "window" chunk mode, which was written to split the input time length in week-long intervals. I have not tested its behavior when the time length is not a multiple of 168 (on week in hours).

You could try switching to chunk_mode = "classic", see if the problem persists. If it doesn't, you may have to rewrite the Window MHA to be more flexible.

shamoons commented 3 years ago

Hi, this may be due to the "window" chunk mode, which was written to split the input time length in week-long intervals. I have not tested its behavior when the time length is not a multiple of 168 (on week in hours).

You could try switching to chunk_mode = "classic", see if the problem persists. If it doesn't, you may have to rewrite the Window MHA to be more flexible.

There is no chunk_mode="classic". It's only: One of 'chunk', 'window' or None.

maxjcohen commented 3 years ago

Sorry, I meant chunk_mode=None, I got mixed up in my explanation.