Yes, indeed, l2sq cost function is just a squared pixel-wise L2 cost, i.e. mean squared error loss.
Make sure you are not confusing the individual posterior Q(Z|X) distributions with the aggregated posterior Qz distribution, which is an average of the point-wise posterior distributions. If you known for a fact that Qz is precisely matching (perhaps, by construction?), then indeed you can ignore the penalty part. However, to be honest, from top of my head I can't come up with any reasonable approaches leading to these type of situations...
Hi, Thank you so much for this nice implementation. However i have two questions about the objective function of wae:
best wishes zhangyiyang