Consider separately pass each activation and direction's weights&bias for lstm and gru

webmachinelearning / webnn

🧠 Web Neural Network API

https://www.w3.org/TR/webnn/

Other

397 stars 48 forks source link

Consider separately pass each activation and direction's weights&bias for lstm and gru #751

Open philloooo opened 3 months ago

philloooo commented 3 months ago

This is feedback from when trying to implement gru/lstm on CoreML driven by https://github.com/webmachinelearning/webnn/issues/689. The biases and weights are stacked together for forward and backward directions when it's bidirectional, similarly activations are passed as an array instead of distinct separate values params.

I think it's more explicit and cleaner to follow the CoreML's design which:

Pass bias & weights for each direction separately when it's bidrectional
Pass activations separately for recurrent_activation, cell_activation, activation.

What do you think?

This also helps to unblock the lstm/gru implementation on CoreML from depending on the outcome of MLConstantOperand discussion.

@fdwr @huningxin

fdwr commented 2 months ago

Thank you for the implementation findings. Let me digest those fields again (ironically, I have a short-term memory when it comes to LSTM's details 😅).

@shiyi9801, do you think this would have also helped your implementation? I recall you doing some concatenation manipulation here https://chromium-review.googlesource.com/c/chromium/src/+/5320673 and here https://chromium-review.googlesource.com/c/chromium/src/+/5339174.

shiyi9801 commented 2 months ago

@shiyi9801, do you think this would have also helped your implementation? I recall you doing some concatenation manipulation here https://chromium-review.googlesource.com/c/chromium/src/+/5320673 and here https://chromium-review.googlesource.com/c/chromium/src/+/5339174.

No it actually will do the opposite for DML backend. If the weights/biases are passed separately for each direction then DML backend has to do another concatenation to combine forward and backward weights/biases. (The previous concatenation is to combine bias and recurrent bias) It looks like CoreML prefers separate operands for each direction while DML prefers them as a whole...

https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_lstm_operator_desc

/cc @miaobin for Gru

huningxin commented 2 months ago

The separate weights & bias might help simplify the emulation code, i.e. Chromium TFLite backend, that needs to slice each tensor from the combined one when using it.

/cc @fujunwei

philloooo commented 1 month ago

hi! @huningxin @fdwr what do you think about this again? This will unblock CoreML implementation for lstm and gru by prevent needing to do constant folding or decomposition.