takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.29k stars 230 forks source link

Offline models not giving better results than online #74

Closed hn2 closed 3 years ago

hn2 commented 3 years ago

I tried generating replay buffers using td3 and sac. I tried all available offline algorithms. None gave me better results than online algorithms. Perhaps I am doing something wrong. Can you help? What are the most importing hyperparameters that can affect the results?

takuseno commented 3 years ago

@hn2 Thanks for the issue. Let me clarify. You generated the dataset by using the trained and fixed policy and use the dataset to train offline RL algorithms. And, you observed that online algorithms such as SAC are performing the best in offline RL settings (not the ones trained in online settings). Is this correct? And, could you explain the task more specifically (e.g. Hopper-v2)?

I believe this is not a bug since even online algorithms are doing good for some datasets.

hn2 commented 3 years ago

This is my own custom env. It uses historical trading assets data to try and construct an optimal portfolio for trading. I am using stable baselines for the online algos (td3, sac). I then trained all offline algos ('BCQ', 'BEAR', 'AWR', 'CQL', 'AWAC', 'CRR', 'PLAS', 'MOPO', 'COMBO') for 100 epochs. None of them was better than td3 or sac. Are there any hyperparameters to tune? Is it possible to change topology of nn for example? It is not a bug since your code is working, but it is not doing any better than online models. I am trying to understand how to work with it and potentially how to tune it. Here are some excerpts from my code:

model = v_online_class.load(v_online_model_file_name) dataset = to_mdp_dataset(model.replay_buffer)

v_offline_model = offline_class(use_gpu=torch.cuda.is_available()) v_offline_model.fit(dataset.episodes, n_epochs=n_epoches, experiment_name='experiment1', logdir=v_offline_model_dir) v_offline_model.save_model(fname=v_offline_model_file_name)

takuseno commented 3 years ago

Are you comparing the result of offline training with the result of online training? If so, the offline RL algos never overcome the online RL algos. Basically, the online RL is performing the best. The offline RL fits the case where online interaction is not feasible.

takuseno commented 3 years ago

Also, of course, there are a bunch of parameters you can tune. Please see documentation. https://d3rlpy.readthedocs.io/en/v0.90/references/algos.html

hn2 commented 3 years ago

Isn't m y problem a good suit for offline algos? I basically have offline historical data for assets pricing and want to train an agent to to trade the best possible portfolio.

takuseno commented 3 years ago

It sounds like an offline RL problem. But, you're comparing the online RL agent trained online with the offline RL agent trained offline. Is this right? I believe that this comparison is not the right direction.

hn2 commented 3 years ago

What is then?

takuseno commented 3 years ago

I guess what you can do is comparing the performance of only offline RL algorithms. Sorry, this kind of consultant is not an issue of d3rlpy. If you don't have any technical problems with this software, I'll close this issue.