Closed q-55555 closed 5 years ago
After further research, the results were due to a bug causing look-ahead bias. Check out the discord for more information. Someone got better results with the MlpPolicy, but it's worth comparing both again.
It looks like though the environment is still using the last value in the dataframe. Is this intended, or do you plan on going back to using lookback_window_size
? I created an environment based on previous versions that incorporates both lstm and mlp policies where the mlp policy incorporates the previous lookback_window_size
observations into account.
After further research, the results were due to a bug causing look-ahead bias. Check out the discord for more information. Someone got better results with the MlpPolicy, but it's worth comparing both again.
The lookback_window
will likely not be coming back, as it can be replaced by stable-baselines VecEnvStack
, and it is unnecessary for an LSTM model.
As for the LSTM policy, you are welcome to pass in policy: MlpLstmPolicy
to RLTrader()
to experiment with it today, otherwise, we are currently working on improving the success of the model and as such the defaults of this library are subject to change when a better default is found.
The BasePolicy used in the last release is MlpPolicy whereas in the great tutorial https://towardsdatascience.com/using-reinforcement-learning-to-trade-bitcoin-for-massive-profit-b69d0e8f583b, the benefits of MlpLstmPolicy seem to be very important in the results.