Open jiafatom opened 5 years ago
Do you have any equations for describing the expected behavior?
I think there are 2 types of masking that are desired:
Masking values to handle variable length input in RNN architectures: Given the sequence_lens
input, I think this case is already covered by the operators. There are some existing issues in the keras-onnx converter (e.g. onnx/keras-onnx#386), but this is solvable without changing the operators.
Masking values within a sequence (e.g. masking 0
in [1.2, 2.3, 0, 0, 3.1]
): This can not be done with the current LSTM, GRU, or RNN operators. We would need to be able to pass in either a boolean mask Tensor, to identify which values should be skipped. This would allow regions of a sequence to be skipped. The numpy backend for Keras has an implementation that shows how the mask can operate during the steps:
https://github.com/keras-team/keras/blob/master/keras/backend/numpy_backend.py#L224-L228
It is possible to avoid the intra-sequence issue by re-ordering the sequences based on the masking input.
mask_value = 0
X = np.array([1, 2, 3, 0, 0, 4, 5, 0], dtype=np.float32)
indices = np.arange(len(X))
unmasked_ids = np.compress(X != mask_value, indices)
masked_ids = np.compress(X == mask_value, indices)
contiguous_ids = np.concatenate([unmasked_ids, masked_ids])
X[contiguous_ids]
# array([1., 2., 3., 4., 5., 0., 0., 0.], dtype=float32)
The hidden states need to be re-indexed after the RNN operation to pull out the right indexes.
state_ids = np.empty_like(X, dtype=np.int64)
for i, _ in enumerate(X):
if i == 0:
state_ids[i] = i
elif X[i] == mask_value:
state_ids[i] = state_ids[i-1]
else:
state_ids[i] = i
state_ids
# array([0, 1, 2, 2, 2, 5, 6, 6])
I'm not sure the best way to represent this later section in a broadcasted way that maps well to the existing opset. I think it would be easier to updated the LSTM, GRU, and RNN operators to take a mask
boolean Tensor or mask_value
as a new input. This would allow the runtime implementations to skip over timesteps based on the mask, and likely be more efficient.
@wschin, I'm interested in your thoughts on this.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
LSTM does not support mask currently, will it be supported later?
What is needed to take this proposal forward? I also ran into a model which uses the mask input for a Keras LSTM op and I think this is pretty common. There is no way right now to convert the model to ONNX without the LSTM ops being decomposed - sacrificing a lot of performance.
Discussion during the Sept 29 ONNX Operator SIG meeting
Is there any progress with this issue?
Interesting... After a year, there's still no progress with this issue. Masking in LSTM is essential when working with batches of streams (text, audio, etc.).
In keras or tensorflow, LSTM or RNN support masking which uses a mask value to skip timesteps. This is commonly used in some state-of-the-art models such as keras-bert. So we may need support masking for onnx ops LSTM and RNN.