hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 724 forks source link

Decouple Agent and Environment Interactions #1011

Closed hifazibm closed 3 years ago

hifazibm commented 3 years ago

Hi,

I am trying to wrap Stable Baselines into Enterprise RL platform.

One of design constraints I am currently facing - we need the environment and agent interaction decoupled while stable baselines couples them inside the learn method.

Do you have any suggestions? - should I clone learn method for modification, I am afraid I might break the core code and issues during version upgrades .Thanks.

araffin commented 3 years ago

Hello,

What do you mean exactly by decoupling agent and environment information? Decoupling data collection from policy training? In that case, I recommend you to check Stable-Baselines3 which implements collect_rollout() and train() (cf SB3 doc) for each algorithm.

related to #381 probably

hifazibm commented 3 years ago

Hi,

Yes correct. Sample code snippet which I need during training of agent.

DQNAgent(obs_space, action_space, hyperparams) # I dont pass the env obj rather obs and action space definition only

state = environment.reset()
done = False
while not done:
    # Episode timestep
    action = agent.predict(state=state)
    state, reward, done, info = environment.step(action)
    agent.learn(state, next_state, action, reward)
araffin commented 3 years ago

Yes correct. Sample code snippet which I need during training of agent.

It looks like you are re-creating the learn() method. Are you aware you can do multiple calls to .learn() (cf doc)

hifazibm commented 3 years ago

Yes but dont want step the env inside the learn method. Is it ok to clone the learn methods for all algorithms and make the changes to decouple the env out of learn method? I need to interface with external environments/applications - thats why this requirement.

araffin commented 3 years ago

I need to interface with external environments/applications

Why can't you do the interface in the environment? As an example (see https://github.com/hill-a/stable-baselines/issues/341), for a previous project, I interfaced SB2 (python 3) with ROS (python 2) using a socket bridge: https://github.com/araffin/robotics-rl-srl

hifazibm commented 3 years ago

Thanks for the link. To elaborate, I need to train my agent from offline data, something similar to this : https://docs.ray.io/en/latest/rllib-offline.html , does Stable Baselines support this or any work around. Thanks.

araffin commented 3 years ago

if you want to do offline learning, i would recommend you to give https://github.com/takuseno/d3rlpy a try ;) (SB can handle offline learning but none of the algorithms are really made for that, you can read several papers about that ;) starting from the ones in d3rlpy repo)

hifazibm commented 3 years ago

Ok... what about DQN and DDPG which are off-policy algorithms.

Miffyli commented 3 years ago

SB does not support that out of the box (nor I am aware of libraries that specifically support that). With some modifications you could fill the replay buffer with the offline data and then run update steps, or alternatively skip sampling of environment. Note that offline training of DQN might not yield good results.

Closing as the original question was answered and topic is drifting away from stable-baselines territory.

hifazibm commented 3 years ago

Thanks