Hi, I'm trying to build a simulation environment for a recommender system, where multiple timesteps from the environment will be treated as one single state for the Agent to learn the next action - e.g. the user's satisfaction level depends on the time he/she spends on the last 5 videos he/she sees.
Right now I can get it to work by building a custom "real environment" that outputs one step at a time, and getting another PyEnvironment which loads the last 5 steps from the "real environment" and feed that as one "timestep" to the Agent. But this feels a bit weird. Is there a better way of doing this?
Hi, I'm trying to build a simulation environment for a recommender system, where multiple timesteps from the environment will be treated as one single state for the Agent to learn the next action - e.g. the user's satisfaction level depends on the time he/she spends on the last 5 videos he/she sees.
Right now I can get it to work by building a custom "real environment" that outputs one step at a time, and getting another PyEnvironment which loads the last 5 steps from the "real environment" and feed that as one "timestep" to the Agent. But this feels a bit weird. Is there a better way of doing this?