In your implementation, Critic and Actor Network are not sharing any parameters. This is different from the implementation in the original Pensieve.
This fundamentally changes how the critic network gets trained. This may explain why you found that the critic network is not very useful during the training.
In your implementation, Critic and Actor Network are not sharing any parameters. This is different from the implementation in the original Pensieve.
This fundamentally changes how the critic network gets trained. This may explain why you found that the critic network is not very useful during the training.