[question] How to use previously obtained state-action-reward-next state information to save time on training?

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.16k stars 725 forks source link

[question] How to use previously obtained state-action-reward-next state information to save time on training? #1146

Closed kwak9601 closed 2 years ago

kwak9601 commented 2 years ago

I am using TD3, and due to the nature of the environment, my environment is only able to take 1 timestep every 30 seconds or so. But I already got some tuple data ('state-action-next state-reward') for many steps from my past experiments. Is there a way I can use this information so I get a head start in the learning process?

Thank you.

Miffyli commented 2 years ago

What you are looking for is "offline RL" or "imitation learning". SB has some utilities for this (pretrain) but tbh it is not very well supported. SB is only for RL algorithms, so there is no easy support for these. You might want to take a look at imitation or d3rlpy.

Also we recommend using stable-baselines3 as it is more up to date.

You may close this issue if this answers your questions.