Open anirjoshi opened 4 months ago
@anirjoshi Basically you just set "use_lstm": True
in the model
dictionary in AlgorithmConfig.training()
. See here for a more elaborate example: https://github.com/ray-project/ray/blob/master/rllib/examples/custom_recurrent_rnn_tokenizer.py
You don't need the custom tokenizer shown there, just set "use_lstm": True
.
@simonsays1980 Thank you for your quick message. This example seems to be of Tensor Flow, it would be great to see an example of PyTorch. Also, I am not very sure about the terminology. I am not super familiar with RNNs, I just have an MDP defined, and a state is represented by a variable input size
and an action is represeted by size 3 output of the RNN. How easy would it be to use RLLib for this? I was hoping to see some example of this.
@simonsays1980 In particular I have constructed the following environment as an example. Note that this environment has variable size in the inputs!
class ModuloComputationEnv(gym.Env):
"""Environment in which an agent must learn to output mod 2,3,4 of the sum of
seen observations.
Observations are squences of integer numbers ,
e.g. (1,3,4,5)
The action space is just 3 values first for the sum of inputs till now %2, second %3
and third %4.
Rewards are r=-abs(self.ac1-action[0]) - abs(self.ac2-action[1]) - abs(self.ac3-action[2]),
for all steps.
"""
def __init__(self, config):
#the input sequence can have any number from 0,99
self.observation_space = Sequence(Discrete(100), seed=2)
#the action is a vector of 3, [%2, %3, %4], of the sum of the input sequence
self.action_space = MultiDiscrete([2,3,4])
self.cur_obs = None
#this variable maintains the episode_length
self.episode_len = 0
#this variable maintains %2
self.ac1 = 0
#this variable maintains %3
self.ac2 = 0
#this variable maintains %4
self.ac3 = 0
def reset(self, *, seed=None, options=None):
"""Resets the episode and returns the initial observation of the new one.
"""
# Reset the episode len.
self.episode_len = 0
# Sample a random sequence from our observation space.
self.cur_obs = self.observation_space.sample()
#take the sum of the initial observation
sum_obs = sum(self.cur_obs)
#consider the %2, %3, and %4 of the initial observation
self.ac1 = sum_obs%2
self.ac2 = sum_obs%3
self.ac3 = sum_obs%4
# Return initial observation.
return self.cur_obs, {}
def step(self, action):
"""Takes a single step in the episode given `action`
Returns:
New observation, reward, done-flag, info-dict (empty).
"""
# Set `truncated` flag after 10 steps.
self.episode_len += 1
truncated = False
terminated = self.episode_len >= 10
#the reward is the negative of further away from computing the individual values
reward = abs(self.ac1-action[0]) + abs(self.ac2-action[1]) + abs(self.ac3-action[2])
reward = -reward
# Set a new observation (random sample).
self.cur_obs = self.observation_space.sample()
#recompute the %2, %3 and %4 values
self.ac1 = (self.cur_obs+self.ac1)%2
self.ac2 = (self.cur_obs+self.ac2)%3
self.ac3 = (self.cur_obs+self.ac3)%4
return self.cur_obs, reward, terminated, truncated, {}
I would like to use the RLLib library for training some RL algorithm, is it possible? Some help in this regards would be great!
@anirjoshi Your example environment should work with RLlib as long as it implements the gymnasium.Env
interface. The use_lstm
key works for TF or Torch (the framework can be set in the Algorithm.framework("torch")
- the latter sets it to Torch).
You can find here all possible settings for modules in RLlib. Find here an overview over our auto-lstm wrappers that are triggered by the use_lstm
setting.
In regard to the example linked above by me: this is for TF AND Torch. You can run it from the command line and pass in the argument --framework=torch
. I advice you to read carefully through the code and documentation to grow your understanding of how to use RLlib for your experiments.
Description
I can see that RLLib supports the use of RNN. It would be great to have an example that shows the use if RNNs for an environment in RLLib. I would like to implement an RNN for a custom made environment. So, it would be great to have an example that shows this, so that I can use the example to further customize the implementation. Thanks!
Use case
I can see that RLLib supports the use of RNN. It would be great to have an example that shows the use if RNNs for an environment in RLLib. I would like to implement an RNN for a custom made environment. So, it would be great to have an example that shows this, so that I can use the example to further customize the implementation. Thanks!