Makes auxiliary task independent of dynamic models

I made a recent change to try to reduce how many times update_auxiliary_task is called, based on the comments you left on the code.

Now there are 4 places in the code where update_auxiliary_task is being called :

after collecting the rollout and before the reward is calculated
in the ppo loss, it is called once, before the dynamic loss
in the backprop loss, it is called twice - before the dynamic loss and before the backprop loss. The reason being that I'm considering there is a gradient step after the dynamic loss, which would change the model and affect the output of update_auxiliary_task

If the last statement is incorrect, and update_auxiliary_task needs to be called only once per step/epoch/batch, then we can move out of the loss functions back into the learn function and both the ppo loss and backprop loss would share that procedure.

At some point we can look into the update_auxiliary_task as well to see if it can be further simplified.

Edit: I changed the name of the function ppo.update to ppo.learn to make it clear that is where the gradient steps are happening. There are lots of update functions around the code, and I wanted to differentiate that is the only one that actually has a gradient update. Let me know if that is not ok.

numenta / nupic.embodied

Makes auxiliary task independent of dynamic models #32