Closed maxmatical closed 3 years ago
It sounds like you want to reset your environment state, you can define a new method like set_state()
on your environment or reset(state=<whatever>)
and then use that to make sure your environment matches this other environment you're trying to clone. It's not clear what use you are getting out of the environment at this point is.
In things like OpenAI Five, the environment is just the game, and it isn't reset all the time. You call .step()
and this takes the action you chose in the game, and returns the new game state 1/60th of a second later or whatever the delay is.
If I want to incorporate new data (eg new rows of a dataframe) without resetting the environment, it's possible to create a method that will grab the latest observation of my data, such that it is appended to my observations when i call env.step()
? I'm guessing it will follow something like:
df
of len(df) = n
df
such that len(df) = n+1
env.step()
, it will call a method that grabs the last point of df
and append it to the observations in the env
so that len(df) = n+1
, and uses that to return the next observationWould that be the idea of incorporating new information that wasn't previously in the environment? Is there any examples of this available so I can get a better sense?
The gym interface should let you return whatever you want in env.step()
since you control the entire environment implementation. Example:
import time
class Env:
def reset(self):
return time.time()
def step(self, ac):
return time.time(), 0.0, False, {}
This is a trivial environment that returns information from outside the environment without resetting. the env.step()
method could wait for 5 minutes and then return new data if you wanted it to. Am I missing something from your question?
besides of adding a set_state
by yourself like @christopherhesse said, you can also customize your env by redefining/modifying/querying the environment world by env.P, which is lower level customization
I'm closing this due to inactivity
@maxmatical Hello, greetings after 3 years..
Did you find the solution for this problem? I encountered similar situation and hope you could give a hint :)
Hi to everyone, I am in the same point, right now, any steps forward @adiyaDalat ? :D
Hi, could you be more precise in what your issue is and what your settings is etc
Hi, thank you @pseudo-rnd-thoughts, I'll try to explain the best I can. I am using openai gym for training a RL model. The way I trained was by passing a pandas dataframe by the Environment constructor parameter:
env = CustomEnv(train_data, ...)
Now, if I had to do the same in production, how is the correct way? I mean, the production data arrives each 10 minutes, should I have to tweek the Environment in a way it manages asynchronous data? like an observable?. How does the people solve the problem?.
I guess this is the common scenario: you train on data comming from csv or whatever but you have to run it in the real world with data that comes asynchronously, obviously unless you develop for a videogame or so.
Thank you very much in advance! :D
Hi @ernestoalarcongallo, sorry I forgot to reply. I have never used Gym in a production environment so I have no idea if it will definitely work. But your plan for implementing it sounds right but you might have to test around with it to find what works best. The idea on creating a custom environment is correct
I'm applying RL to a scheduling problem using a custom environment, and I am interested in deploying the model on live data and see how it works. So all the observations are in the from of a dataframe. In a production setting, if I were to deploy this agent to production, it would:
env.step(action)
In theory, it should be something like the openai5 when tested against pros. The environment needs to be updated with new observations as they come. Currently I am having trouble figuring out how to update the environment with previously unseen observations. My current workflow on deploying this agent in a live setting is:
obs = env.reset()
model.predict(obs)
env.step()
since there's no further observations)Is there a better way to actually use the agent to perform a task without continuously redefining a new environment? Or ideally is there any tutorials/examples showing the use of an RL agent in a real setting where it needs to take in newly observed states that were not previously defined, and react to those?