Closed kwak9601 closed 2 years ago
What you are looking for is "offline RL" or "imitation learning". SB has some utilities for this (pretrain
) but tbh it is not very well supported. SB is only for RL algorithms, so there is no easy support for these. You might want to take a look at imitation or d3rlpy.
Also we recommend using stable-baselines3 as it is more up to date.
You may close this issue if this answers your questions.
I am using TD3, and due to the nature of the environment, my environment is only able to take 1 timestep every 30 seconds or so. But I already got some tuple data ('state-action-next state-reward') for many steps from my past experiments. Is there a way I can use this information so I get a head start in the learning process?
Thank you.