computation of PD gains and save-state

'_latest_full_state' is used for computing the pd feedbacks but is updated at different places in the code which might cause trouble.

For a more consistent data flow, maybe it would be good to update the state only once also higher up the calling stack.

I suggest to update it once at the beginning of do_simulation (as a copy of the current state). See also the other issue about the endeffector position.

Also, the _latest_full_state should be stored in 'save_state' and 'restore_state' from 'task'. Otherwise, the PD controller is in a weird state at a restoration.

rr-learning / CausalWorld

computation of PD gains and save-state #81