I have compared the implementation and the book "RL: an introduction". It seems the mse loss and cross-entropy loss can not get the update rule as Actor-Critic. It is w=w+alphaIdeltagrad for value function, and theta = theta + alpha I delta grad(ln pi(action)). Especially for value function, mse loss gets another v^hat multiplied.
I have compared the implementation and the book "RL: an introduction". It seems the mse loss and cross-entropy loss can not get the update rule as Actor-Critic. It is w=w+alphaIdeltagrad for value function, and theta = theta + alpha I delta grad(ln pi(action)). Especially for value function, mse loss gets another v^hat multiplied.