Closed hifazibm closed 3 years ago
Hello,
What do you mean exactly by decoupling agent and environment information?
Decoupling data collection from policy training?
In that case, I recommend you to check Stable-Baselines3 which implements collect_rollout()
and train()
(cf SB3 doc) for each algorithm.
related to #381 probably
Hi,
Yes correct. Sample code snippet which I need during training of agent.
DQNAgent(obs_space, action_space, hyperparams) # I dont pass the env obj rather obs and action space definition only
state = environment.reset()
done = False
while not done:
# Episode timestep
action = agent.predict(state=state)
state, reward, done, info = environment.step(action)
agent.learn(state, next_state, action, reward)
Yes correct. Sample code snippet which I need during training of agent.
It looks like you are re-creating the learn()
method. Are you aware you can do multiple calls to .learn()
(cf doc)
Yes but dont want step the env inside the learn method. Is it ok to clone the learn methods for all algorithms and make the changes to decouple the env out of learn method? I need to interface with external environments/applications - thats why this requirement.
I need to interface with external environments/applications
Why can't you do the interface in the environment? As an example (see https://github.com/hill-a/stable-baselines/issues/341), for a previous project, I interfaced SB2 (python 3) with ROS (python 2) using a socket bridge: https://github.com/araffin/robotics-rl-srl
Thanks for the link. To elaborate, I need to train my agent from offline data, something similar to this : https://docs.ray.io/en/latest/rllib-offline.html , does Stable Baselines support this or any work around. Thanks.
if you want to do offline learning, i would recommend you to give https://github.com/takuseno/d3rlpy a try ;) (SB can handle offline learning but none of the algorithms are really made for that, you can read several papers about that ;) starting from the ones in d3rlpy repo)
Ok... what about DQN and DDPG which are off-policy algorithms.
SB does not support that out of the box (nor I am aware of libraries that specifically support that). With some modifications you could fill the replay buffer with the offline data and then run update steps, or alternatively skip sampling of environment. Note that offline training of DQN might not yield good results.
Closing as the original question was answered and topic is drifting away from stable-baselines territory.
Thanks
Hi,
I am trying to wrap Stable Baselines into Enterprise RL platform.
One of design constraints I am currently facing - we need the environment and agent interaction decoupled while stable baselines couples them inside the learn method.
Do you have any suggestions? - should I clone learn method for modification, I am afraid I might break the core code and issues during version upgrades .Thanks.