How to use a trained agent in a production setting using a custom environment?

maxmatical commented 5 years ago

I'm applying RL to a scheduling problem using a custom environment, and I am interested in deploying the model on live data and see how it works. So all the observations are in the from of a dataframe. In a production setting, if I were to deploy this agent to production, it would:

only see 1 observation corresponding to the latest state (1 row in the dataframe)
the agent takes an action
then wait for the next row of the dataframe to be populated (as a result of the agent's actions and other external factors)
do the env.step(action)

In theory, it should be something like the openai5 when tested against pros. The environment needs to be updated with new observations as they come. Currently I am having trouble figuring out how to update the environment with previously unseen observations. My current workflow on deploying this agent in a live setting is:

Load the custom environment with the current observation (1 row of the dataframe)
obs = env.reset()
Get an action from model.predict(obs)
Manually perform the action in the real world, and observe the next step and save the latest state ( so there would not be a env.step() since there's no further observations)
Create a new environment with the latest observation, and repeat steps 2-5 continuously.

Is there a better way to actually use the agent to perform a task without continuously redefining a new environment? Or ideally is there any tutorials/examples showing the use of an RL agent in a real setting where it needs to take in newly observed states that were not previously defined, and react to those?

christopherhesse commented 5 years ago

It sounds like you want to reset your environment state, you can define a new method like set_state() on your environment or reset(state=<whatever>) and then use that to make sure your environment matches this other environment you're trying to clone. It's not clear what use you are getting out of the environment at this point is.

In things like OpenAI Five, the environment is just the game, and it isn't reset all the time. You call .step() and this takes the action you chose in the game, and returns the new game state 1/60th of a second later or whatever the delay is.

maxmatical commented 5 years ago

If I want to incorporate new data (eg new rows of a dataframe) without resetting the environment, it's possible to create a method that will grab the latest observation of my data, such that it is appended to my observations when i call env.step()? I'm guessing it will follow something like:

When the agent is acting, it only has information available in the df of len(df) = n
After the action step, a new row of observations is appended to the df such that len(df) = n+1
when I run env.step(), it will call a method that grabs the last point of df and append it to the observations in the env so that len(df) = n+1, and uses that to return the next observation

Would that be the idea of incorporating new information that wasn't previously in the environment? Is there any examples of this available so I can get a better sense?

christopherhesse commented 4 years ago

The gym interface should let you return whatever you want in env.step() since you control the entire environment implementation. Example:

import time
class Env:
    def reset(self):
        return time.time()
    def step(self, ac):
         return time.time(), 0.0, False, {}

This is a trivial environment that returns information from outside the environment without resetting. the env.step() method could wait for 5 minutes and then return new data if you wanted it to. Am I missing something from your question?

shuruiz commented 4 years ago

besides of adding a set_state by yourself like @christopherhesse said, you can also customize your env by redefining/modifying/querying the environment world by env.P, which is lower level customization

jkterry1 commented 3 years ago

I'm closing this due to inactivity

adiyaDalat commented 2 years ago

@maxmatical Hello, greetings after 3 years..

Did you find the solution for this problem? I encountered similar situation and hope you could give a hint :)

ernestoalarcongallo commented 2 years ago

Hi to everyone, I am in the same point, right now, any steps forward @adiyaDalat ? :D

pseudo-rnd-thoughts commented 2 years ago

Hi, could you be more precise in what your issue is and what your settings is etc

ernestoalarcongallo commented 2 years ago

Hi, thank you @pseudo-rnd-thoughts, I'll try to explain the best I can. I am using openai gym for training a RL model. The way I trained was by passing a pandas dataframe by the Environment constructor parameter: env = CustomEnv(train_data, ...)

Now, if I had to do the same in production, how is the correct way? I mean, the production data arrives each 10 minutes, should I have to tweek the Environment in a way it manages asynchronous data? like an observable?. How does the people solve the problem?.

I guess this is the common scenario: you train on data comming from csv or whatever but you have to run it in the real world with data that comes asynchronously, obviously unless you develop for a videogame or so.

Thank you very much in advance! :D

pseudo-rnd-thoughts commented 2 years ago

Hi @ernestoalarcongallo, sorry I forgot to reply. I have never used Gym in a production environment so I have no idea if it will definitely work. But your plan for implementing it sounds right but you might have to test around with it to find what works best. The idea on creating a custom environment is correct

openai / gym

How to use a trained agent in a production setting using a custom environment? #1684