pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2k stars 268 forks source link

[Feature Request] Please provide the genetic and low-level functionality rather than the high-level interface like agent.train() #90

Open walkacross opened 2 years ago

walkacross commented 2 years ago

hi, it's really great that facebookresearch is considering provide a library for reinforcement learning research.

it would be very helpful if the library provide the low-level functionality rather than the high-level interface like agent.train() which then became another existing library like stable-baselines3 (https://github.com/DLR-RM/stable-baselines3).

would you mind refer to the philosophy and design of another rl library cherry, which only provides the general-purpose low-level functionality. https://github.com/learnables/cherry

have a good day.

vmoens commented 2 years ago

Thanks for this suggestion @walkacross, and glad you feel it's a good idea for facebookresearch to consider this!

Indeed, having low-level functionality for the agent would be an improvement! That would be more aligned with our philosophy, too. We'd like to avoid magic one-fits-all functions at all costs!

To be open about this, the Agent class is currently a bit rusty and it mostly serves the purpose of displaying what the other components of TorchRL do. A bunch of its functionalities should be refactored, or even placed in examples / tutorials / doc. I'll keep the issue open to keep track of the progress. If you have any concrete suggestion about the API you'd like to see for the agent such that it is general / abstract enough to cover more than one use case, we'd be delighted to hear about it!

walkacross commented 2 years ago

hi @vmoens, thanks for your quick feedback.

Every part in TorchRL, like objectives and td_module, is awesome, which is aligned with your philosophy, do rl in a highly modular way.

one thing makes torchrl become another rl library rather than a rl framework, is the existing of Agent class (i see, the Agent class currently mostly serves the purpose of displaying what the other components of TorchRL do). The more general question behind that is: whether the concrete train process should be provided by torchrl ? or "can we implement deep reinforcement learning in the same way that we do with deep learning in pytorch?"

maybe the pytorch will answer no. leave the concrete train process to the user and the responsibility of framework is to abstract the process and then provide the low-level tools.

from my view, some snippets codes in user's main.py maybe like

import gym
import torch.nn as nn
from torchrl.collectors.collectors import DataCollector
from torchrl.env import env1Wrapper, LogWrapper
from torchrl.data import ReplayBuffer
from torchrl.modules.td_module import ActorWrapper1, CriticWrapper1, ActorCriticWrapper1  
from torchrl.modules.exploration import ExplorationrWrapper1
from torchrl.objectives.costs.common import LossModule
from torchrl.low_level_tools import normalize_reward, GAE
# ===============================================
# it's the user's responsibility to design custom model arch
# ===============================================
def Custom_actor(nn.Module):
     pass

def Custom_critic(nn.Module):
    pass

env  = gym.make_env("env_name")
env  =  env1Wrapper(env)
env =  LogWrapper(env)

actor_net = Custom_actor()
actor_net = ActorWrapper1(actor_net)
actor_net = ExplorationrWrapper1(actor_net)

critic_net = Custom_critic()
critic_net = CriticWrapper1(critic_net)
actor_critic = ActorCriticWrapper1(actor_net, critic_net)

collector = DataCollector(env, actor_net, total_frames=100)
loss_module = LossModule(actor_net, critic_net, gamma=0.99) 

def custom_step(replay_buffer, batch_size, loss_module, optimizer):
     batch =  replay_buffer.sample(batch_size)
     #maybe perform model-based algorithms

     #maybe perform model-free algorithms
     loss_td = loss_module(batch)
     optimizer.zero_grad()
     loss_td.backward()
     optimizer.step()

for i, batch in enumerate(collector):

        replay_buffer.extend(batch)

        custom_step()

        collector.update_policy_weights_()

Hope my suggestion helps.

vmoens commented 2 years ago

Broadly speaking I agree with you. I was reluctant to code an agent class at first, then i realized that the training loop of all the examples was just a copy paste of one another so I thought that it would make sense to dump it in a single class to facilitate the passing of arguments (such as number of optimization steps per data collection etc) and avoiding obvious bugs due to missing a simple line of code (sync of the policy across processes for instance). That being said, yes, the question of whether an agent class has a place in torchrl is valid. Perhaps it's something that should belong to lightning, or a similar library? That's something we could investigate. For now a "fix" to this issue could be to move the entire agent directory to the examples, thereby removing it from the library itself. Thanks for keeping the conversation going! Those design decisions are extremely important and impactful on what the library will look like.

walkacross commented 2 years ago

1 the question of whether an agent class has a place in torchrl is valid. Perhaps it's something that should belong to lightning(or pytorch-ignite)

2 a "fix" to this issue could be to move the entire agent directory to the examples, thereby removing it from the library itself.

yes, I totally agree with you.

from my perspective as a user, I hope I can implement deep reinforcement learning in TorchRL in the same way that we do with deep learning in pytorch, regardless of the category of rl algorithm (model-based/mode-free, on-policy/off-policy et al), which require torchrl help users sort out the genetic workflow and then provide the tools come into the reasonable abstract level.

have a good day and hope everything is going well.

smorad commented 2 years ago

Broadly speaking I agree with you. I was reluctant to code an agent class at first, then i realized that the training loop of all the examples was just a copy paste of one another so I thought that it would make sense to dump it in a single class to facilitate the passing of arguments (such as number of optimization steps per data collection etc) and avoiding obvious bugs due to missing a simple line of code (sync of the policy across processes for instance).

I think it depends on whether the main torchrl audience is researchers or industry. Many researchers will need to modify these loops to plug in modules like curiosity, model-based RL, recurrent networks, etc. IMO there is an overabundance of libraries that let you train.py --alg=PPO --env=cartpole, but very few that do modularity right and make it easy to try out new ideas using existing infrastructure.

vmoens commented 2 years ago

@smorad thanks for keeping the conversation going!

I agree with you, the audience being researchers what we want is exactly that: low level tools you can use across algorithms and re use for your own research.

Hopefully with the tutorials we're working on this will become clearer!

vmoens commented 1 year ago

Update on this

We're currently working on making the script version of each of the example algorithms. We'll get there in the near future!

We have already a ipynb detailing how to code and train ddpg.

walkacross commented 1 year ago

the code in tutorials like coding_ddpg.ipynb is clean and readable. it's great to see the torchrl is in the line as we all expect. thanks for your time.

vmoens commented 1 year ago

@walkacross you might as well have a look at #381, the DDPG notebook does not contain results atm.