hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.13k stars 724 forks source link

Can an agent learn valid actions offline, being able to choose only actions that were already taken (e.g. from historical data) ? [question] #645

Open VieVaWaldi opened 4 years ago

VieVaWaldi commented 4 years ago

Hi,

Can anyone give me advice on training an RL agent, that can choose actions only from a given data set.

I am working on a control system problem. I have collected half a year worth of data about a machine that produces parts. The data contains setpoints, measurements and information about the quality of the produces parts.

For safety reasons the agent can not learn online. Therefore the agent needs to learn offline on the historical data. However, i can not wrap my head around an agent that produces valid setpoints as actions.

There are multiple papers that implement an agent offline, e.g. https://arxiv.org/pdf/1709.05077.pdf, however i do not understand how the agent chooses an action in these implementations.

Cheers,

Walter Ehren

araffin commented 4 years ago

Maybe related to #351

VieVaWaldi commented 4 years ago

In my case the agent can not interact with the environment. Therefore the agent can only choose actions that have already been choosen once.

Miffyli commented 4 years ago

We do not offer support with projects, this is a place for issues and enhancements specifically for stable-baselines.

Quick comments: You might want to check out keywords "imitation learning", "behavioural cloning" and "batch reinforcement learning" (e.g. https://arxiv.org/abs/1910.01708). Stable-baselines does not focus on problems like this.

matthew-hsr commented 4 years ago

If I am not mistaken, one thing that might help is to start with Behavior Cloning, available in stable-baselines: https://stable-baselines.readthedocs.io/en/master/guide/pretrain.html#generate-expert-trajectories (though generating the expert trajectories may require more manual tweaks)?

VieVaWaldi commented 4 years ago

I will take a look at Behavior Cloning. Thanks a lot for the suggestion. As a side note, this project is my bachelor thesis.