Closed vlad17 closed 6 years ago
This is especially important to verify that the learner is converging appropriately. In addition to the raw iteration-to-iteration losses for the learner, it would be important to report the learner's loss on the VALIDATION dataset; that is, how the learner actually predicted the MPC's behavior.
(solved by #66 )
For dynamics and controller neural networks, set up (optional, flag-controlled) logging through
logz.py
for neural network average gradient magnitude (for every on-policy aggregation iteration). In addition, record the network's corresponding objective loss before / after training that iteration.