Support masking for LSTM, RNN?

jiafatom commented 5 years ago

In keras or tensorflow, LSTM or RNN support masking which uses a mask value to skip timesteps. This is commonly used in some state-of-the-art models such as keras-bert. So we may need support masking for onnx ops LSTM and RNN.

wschin commented 5 years ago

Do you have any equations for describing the expected behavior?

cjermain commented 4 years ago

I think there are 2 types of masking that are desired:

Masking values to handle variable length input in RNN architectures: Given the sequence_lens input, I think this case is already covered by the operators. There are some existing issues in the keras-onnx converter (e.g. onnx/keras-onnx#386), but this is solvable without changing the operators.
Masking values within a sequence (e.g. masking 0 in [1.2, 2.3, 0, 0, 3.1]): This can not be done with the current LSTM, GRU, or RNN operators. We would need to be able to pass in either a boolean mask Tensor, to identify which values should be skipped. This would allow regions of a sequence to be skipped. The numpy backend for Keras has an implementation that shows how the mask can operate during the steps: https://github.com/keras-team/keras/blob/master/keras/backend/numpy_backend.py#L224-L228

cjermain commented 4 years ago

It is possible to avoid the intra-sequence issue by re-ordering the sequences based on the masking input.

mask_value = 0
X = np.array([1, 2, 3, 0, 0, 4, 5, 0], dtype=np.float32)
indices = np.arange(len(X))

unmasked_ids = np.compress(X != mask_value, indices)
masked_ids = np.compress(X == mask_value, indices)

contiguous_ids = np.concatenate([unmasked_ids, masked_ids])
X[contiguous_ids]
# array([1., 2., 3., 4., 5., 0., 0., 0.], dtype=float32)

The hidden states need to be re-indexed after the RNN operation to pull out the right indexes.

state_ids = np.empty_like(X, dtype=np.int64)

for i, _ in enumerate(X):
    if i == 0:
        state_ids[i] = i
    elif X[i] == mask_value:
        state_ids[i] = state_ids[i-1]
    else:
        state_ids[i] = i

state_ids
# array([0, 1, 2, 2, 2, 5, 6, 6])

I'm not sure the best way to represent this later section in a broadcasted way that maps well to the existing opset. I think it would be easier to updated the LSTM, GRU, and RNN operators to take a mask boolean Tensor or mask_value as a new input. This would allow the runtime implementations to skip over timesteps based on the mask, and likely be more efficient.

@wschin, I'm interested in your thoughts on this.

stale[bot] commented 2 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

DayDayupupupup commented 2 years ago

LSTM does not support mask currently, will it be supported later?

Andreas-Krebbel commented 2 years ago

What is needed to take this proposal forward? I also ran into a model which uses the mask input for a Keras LSTM op and I think this is pretty common. There is no way right now to convert the model to ONNX without the LSTM ops being decomposed - sacrificing a lot of performance.

AlexandreEichenberger commented 2 years ago

Discussion during the Sept 29 ONNX Operator SIG meeting

consider adding the mask into as a composite operator (Rama)
can a scan op be used (Liqun)
look also how this could be used in the PyTorch framework Can look into a previous PR used composite operators: https://github.com/onnx/onnx/pull/4458

AndreyOrb commented 1 year ago

Is there any progress with this issue?

AndreyOrb commented 1 month ago

Interesting... After a year, there's still no progress with this issue. Masking in LSTM is essential when working with batches of streams (text, audio, etc.).

onnx / onnx

Support masking for LSTM, RNN? #2248