Generalize scatter/gather ops by using SIMD assembler. This simplifies the kernels but also makes them more general so they can be used for any supported types. Also added support for OOV for scatter ops. Transformation of multiple accumulating scatter ops on the same embedding matrix is now also supported.
Generalize scatter/gather ops by using SIMD assembler. This simplifies the kernels but also makes them more general so they can be used for any supported types. Also added support for OOV for scatter ops. Transformation of multiple accumulating scatter ops on the same embedding matrix is now also supported.