tensortrade-org / tensortrade

An open source reinforcement learning framework for training, evaluating, and deploying robust trading agents.
https://discord.gg/ZZ7BGWh
Apache License 2.0
4.52k stars 1.02k forks source link

Random starting point in the env to avoid overfitting? #368

Closed trungtv closed 2 years ago

trungtv commented 2 years ago

Hello, I am testing out PPO with BSH action scheme and position-based returns. I am quite happy with the learned models but it seems to be too overfitted to the train data. In fact, the model is worse for unseen test data. What is your recommendation to avoid overfitting? I am thinking about random starting in the ENV but do not know the right way to implement it? Is there anyone working on it? Thanks

robanos4 commented 2 years ago

I am new to tensortrade, but random starts is usually a very important training strategy in RL - especially in complex DRL models, as ANNs tend to overfit. So, if there are functions to do so, make a +1 on this issue :)

carlogrisetti commented 2 years ago

There is no function yet implemented, but you could play with the minibatch size.

How would you implement a random start feature?

robanos4 commented 2 years ago

Before the end of a new trial, draw a random number r in [0,training set size-1] and start the new trial from observation r in the series (In fact, you can also have a random stop in the same manner, because the learner will most likely overfit the end of the series otherwise). In this framework, it seems most fitting to place these functions as input in the training function (based on my limited knowledge of it, so I might be wrong).

carlogrisetti commented 2 years ago

I am wondering if you can achieve a similar effect with the rollout fragment length in Ray...

This seems something that needs to be implemented in the training framework of your choice. If it was me I would implement it in the creator function that Ray uses to initialize the environment.

I might give it a go in the future. Please let us know if you manage to do so before. I am trying to extend the examples/documentation with up to date content, focused on having Ray as a backend.

robanos4 commented 2 years ago

Thank you - I will get to know the framework better and test the suggested solutions + leave an update here!

carlogrisetti commented 2 years ago

I tried to work my mind on that a little bit (to be honest in some spare time hehe, so some more work could be put into this), but couldn't figure out where to inject that "random start" code when training with Ray.

The environment gets built at the start of the training, and each trial just runs over and over the same environment, not building that anymore. The brute force solution would be to randomize the start in the environment creation, and force an environment recreation mid training (like saving a checkpoint, stopping the training and resuming it from the saved checkpoint, which would force the environment to be recreated), but i am sure there is a cleaner and much streamlined way to do that, without fiddling a lot with it.

In case there's no way to do that, a PR on the Ray side is also possible, but would keep that as a last resort.

I'll have to look a little bit more into this. @robanos4 do you have any update?

Thanks

avacaondata commented 2 years ago

What I would do is to randomize the start in the .reset() of the environment; when the environment is initialized it has the full streams, but then when we reset the environment we can select a random start and go from there.

abstractguy commented 2 years ago

Done. Please close the issue.

carlogrisetti commented 2 years ago

@trungtv @robanos4 @alexvaca0 you can test the feature by installing tensortrade from master with pip install git+https://github.com/tensortrade-org/tensortrade.git

There is a new "random_start" parameter that does just that, not expressed as a boolean flag, but as a percentage of the dataset in which to randomly start (ie: randomly start in the first 10% of data). Default is to randomly start in the first 0% of data (which effectively disables this feature)

Closing as per #392 Will be released in v1.0.4