tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.78k stars 719 forks source link

Is there an efficient way to combine multiple timesteps from a PyEnvironment as one state? #446

Open peidaqi opened 4 years ago

peidaqi commented 4 years ago

Hi, I'm trying to build a simulation environment for a recommender system, where multiple timesteps from the environment will be treated as one single state for the Agent to learn the next action - e.g. the user's satisfaction level depends on the time he/she spends on the last 5 videos he/she sees.

Right now I can get it to work by building a custom "real environment" that outputs one step at a time, and getting another PyEnvironment which loads the last 5 steps from the "real environment" and feed that as one "timestep" to the Agent. But this feels a bit weird. Is there a better way of doing this?

kuanghuei commented 4 years ago

Can you use an environment wrapper to stack states using a deque? See the following example:

https://github.com/tensorflow/agents/blob/master/tf_agents/environments/atari_wrappers.py#L32