Open VieVaWaldi opened 4 years ago
Maybe related to #351
In my case the agent can not interact with the environment. Therefore the agent can only choose actions that have already been choosen once.
We do not offer support with projects, this is a place for issues and enhancements specifically for stable-baselines.
Quick comments: You might want to check out keywords "imitation learning", "behavioural cloning" and "batch reinforcement learning" (e.g. https://arxiv.org/abs/1910.01708). Stable-baselines does not focus on problems like this.
If I am not mistaken, one thing that might help is to start with Behavior Cloning, available in stable-baselines: https://stable-baselines.readthedocs.io/en/master/guide/pretrain.html#generate-expert-trajectories (though generating the expert trajectories may require more manual tweaks)?
I will take a look at Behavior Cloning. Thanks a lot for the suggestion. As a side note, this project is my bachelor thesis.
Hi,
Can anyone give me advice on training an RL agent, that can choose actions only from a given data set.
I am working on a control system problem. I have collected half a year worth of data about a machine that produces parts. The data contains setpoints, measurements and information about the quality of the produces parts.
For safety reasons the agent can not learn online. Therefore the agent needs to learn offline on the historical data. However, i can not wrap my head around an agent that produces valid setpoints as actions.
There are multiple papers that implement an agent offline, e.g. https://arxiv.org/pdf/1709.05077.pdf, however i do not understand how the agent chooses an action in these implementations.
Cheers,
Walter Ehren