training a model with 32 mixtures currently takes around 1.5 seconds. since networks are entirely independent, it should be possible to multithread (c++11 ThreadPool implementation - https://github.com/en4bz/ThreadPool)
Also, when fitted, a large number of mixtures has very low weight (those components with mean=0,prec=1,weight<~1e-5), however they decrease performance. they could be easily removed after pre-training, or even online somewhere in vmp::MoGArray.
training a model with 32 mixtures currently takes around 1.5 seconds. since networks are entirely independent, it should be possible to multithread (c++11 ThreadPool implementation - https://github.com/en4bz/ThreadPool) Also, when fitted, a large number of mixtures has very low weight (those components with mean=0,prec=1,weight<~1e-5), however they decrease performance. they could be easily removed after pre-training, or even online somewhere in vmp::MoGArray.