Generalized train function.
2.1. It can use noise, generated for each batch independently.
2.2. It can take optim defined outside training function (useful for additional training)
2.3. Fixed train score.
2.4. Added validation score.
Improved batch_loss interface which takes now feed-forward batch and reference batch.