Deep Q-value network evaluation in SAC algorithm

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.09k stars 727 forks source link

Deep Q-value network evaluation in SAC algorithm #1166

Open moizuet opened 1 year ago

moizuet commented 1 year ago

I am implementing Soft-Actor Critic (SAC) agent and need to evaluate q-value network inside my custom environment (for the implementation of a special algorithm, called Wolpertinger's algorithm, to handle large discrete action spaces). I have tried to get the q-values from SAC class object, but failed. Any method or function like the one with stable baselines' PPO algorithm's implementation (namely, .value) will be very helpful.

Miffyli commented 1 year ago

I would first suggest moving using stable-baselines3: it is more refined and still mantained. This version is no longer mantained.

To answer your question: there is no convenience function for this, but you can check how SAC does the value prediction in SB3 here, and try to replicate it yourself.

moizuet commented 1 year ago

Unfortunately I have implemented rest of RL algorithms, layers and optimizers in tensorflow and stable-baselines2 ecosystem. I cannot switch right now but I will consider using stable-baselines3 and specially Rllib in the future.

Also it will be a great coding exercise for me to implement this q-value evaluation method.

Cheers..