Masking time steps in order to use TCN for variable length sequences

fsbashiri commented 1 year ago

Describe the bug In my project, I am using TCN for sequence-to-sequence analysis of time series data that have variable lengths. I have defined a subclass of the Sequence class that pads each batch of data to its maximum sequence length (similar to what is suggested here). As for the model, I use a masking layer to compute and pass a mask to TCN (as suggested here issue #234). Supposedly, layers that support masking will automatically propagate the mask to the next layer. In the simplest form of my model, I have a masking layer, followed by a TCN, and a Dense layer with 1 unit.

Here are two issues that I've got:

when I try to access the propagated mask from the output of the TCN layer, I get an error that says the object has no attribute _keras_mask.
Apparently, it matters if a sequence is padded from the beginning of the sequence or a the end (whether to set padding argument of the pad_sequences 'pre' or 'post'). If it's padded from the beginning, the output of the TCN at those padded time steps is equal to zero. But you cannot expect zero output if it's padded at the end of the sequence.

Paste a snippet Please see the following simple code:

mask_value = 1.0
x = np.random.rand(1, 3, 8)  # 1 sample, 3 time steps, 8 features
x_pre = np.append(mask_value * np.ones((1, 2, 8)), x, axis=1)  # append (pre-padding) 2 time steps with mask_value
x_post = np.append(x, mask_value * np.ones((1, 2, 8)), axis=1)  # append (post-padding) 2 time steps with mask_value
print(f"x_pre: \n{x_pre}")

# Sequential modeling
model = Sequential()
model.add(Masking(mask_value=mask_value))
model.add(TCN(nb_filters=64,
              nb_stacks=1,
              kernel_size=3,
              dilations=[1, 2],
              padding='causal',
              return_sequences=True))
model.add(TimeDistributed(Dense(1)))

out_pre = model(x_pre)
out_post = model(x_post)
print(f"out_pre: \n{out_pre}")
print(f"out_post: \n{out_post}")
print(f"out_pre._keras_mask: \n{out_pre._keras_mask}")

The output of the code:

x_pre: 
[[[1.         1.         1.         1.         1.         1.
   1.         1.        ]
  [1.         1.         1.         1.         1.         1.
   1.         1.        ]
  [0.66743025 0.3803879  0.06403598 0.6146936  0.34356068 0.08322509
   0.97064031 0.67479811]
  [0.38705443 0.18246054 0.17536628 0.8973423  0.63538071 0.35077733
   0.33901726 0.35183449]
  [0.10048297 0.33713389 0.61988985 0.74523683 0.48507557 0.21914819
   0.86720421 0.66290713]]]
out_pre: 
[[[ 0.        ]
  [ 0.        ]
  [-0.28437746]
  [-0.29050288]
  [-0.84733   ]]]
out_post: 
[[[-0.28437746]
  [-0.29050288]
  [-0.84733   ]
  [-0.05684387]
  [-0.7985336 ]]]
Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1491, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/fbashiri/Documents/Projects/MGP-AttTCN-master/src/loss_n_eval/Azi_test_loss.py", line 132, in <module>
    print(f"out_pre._keras_mask: \n{out_pre._keras_mask}")
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute '_keras_mask'

Dependencies I am using: keras 2.4.3 keras-tan 3.1.1 Tensorflow-gpu 2.3.1

philipperemy commented 1 year ago

@fsbashiri thanks for reporting! I propose an explanation. I'm not 100% sure, you can challenge me.

My clue is the TCN works a bit like an RNN even though it has no states like a LSTM would have.

The last outputs depend on the end of the sequence but also on the beginning.

If you use post, the end is padded with zeros. The last outputs will be non zeros because they also depend on the beginning of the sequence, which contains values (non zeros).
Conversely, the first outputs only depends on the beginning of the sequence. If you use pre, the beginning will be padded with 0 and the first outputs will be 0.

For your second point _keras_mask, I guess we should not directly try to call it. But it's strange if it does not exist. Does it exist for other Keras layers that support masking? Maybe we should add it somewhere in the layer because it's not inherited from the Layer object. I don't know.

cruyffturn commented 1 year ago

This problem is mentioned on issue #89. Author states "Con1d by keras lacking supports for Masking layer".

philipperemy / keras-tcn

Masking time steps in order to use TCN for variable length sequences #240