Instead of serializing the m x n x N cond_prob matrix to C++, we can serialize the m and n dimension separately (without the outer product).
We could either do the outer product up front in C++, or we could do it "lazily" on every EM step. This is more computation, but could actually speed things up because we would save a lot in memory bandwidth.
Instead of serializing the m x n x N cond_prob matrix to C++, we can serialize the m and n dimension separately (without the outer product).
We could either do the outer product up front in C++, or we could do it "lazily" on every EM step. This is more computation, but could actually speed things up because we would save a lot in memory bandwidth.