Open moizuet opened 1 year ago
I would first suggest moving using stable-baselines3: it is more refined and still mantained. This version is no longer mantained.
To answer your question: there is no convenience function for this, but you can check how SAC does the value prediction in SB3 here, and try to replicate it yourself.
Unfortunately I have implemented rest of RL algorithms, layers and optimizers in tensorflow and stable-baselines2 ecosystem. I cannot switch right now but I will consider using stable-baselines3 and specially Rllib in the future.
Also it will be a great coding exercise for me to implement this q-value evaluation method.
Cheers..
I am implementing Soft-Actor Critic (SAC) agent and need to evaluate q-value network inside my custom environment (for the implementation of a special algorithm, called Wolpertinger's algorithm, to handle large discrete action spaces). I have tried to get the q-values from SAC class object, but failed. Any method or function like the one with stable baselines' PPO algorithm's implementation (namely, .value) will be very helpful.