Is there an efficient way to combine multiple timesteps from a PyEnvironment as one state?

tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Apache License 2.0

2.78k stars 719 forks source link

Hi, I'm trying to build a simulation environment for a recommender system, where multiple timesteps from the environment will be treated as one single state for the Agent to learn the next action - e.g. the user's satisfaction level depends on the time he/she spends on the last 5 videos he/she sees.

Right now I can get it to work by building a custom "real environment" that outputs one step at a time, and getting another PyEnvironment which loads the last 5 steps from the "real environment" and feed that as one "timestep" to the Agent. But this feels a bit weird. Is there a better way of doing this?

tensorflow / agents

Is there an efficient way to combine multiple timesteps from a PyEnvironment as one state? #446