Reptile Optimizer? - Githubissues

marvin-hansen commented 5 years ago

Is there a chance to investigate whether the One-Shot Reptile optimizer can deliver results comparable to Bayesian / Evolution?

I observed that the Bayesian / Evolution combo usually requires data samples in excess of a certain threshold but for some reasons, every once in a while, it performs well on a dataset with just 100 samples. Obviously, that begs the question if the Reptile K-shot approach would work well on more small datasets?

Reptile seems to be computationally fairly inexpensive, offers exponential loss decay within the first 10 fits, and when implemented in PyTorch, it can run on the GPU - a nice bonus for training on larger datasets.

Notebook: https://github.com/AdrienLE/ANIML/blob/master/ANIML.ipynb

Paper: https://arxiv.org/abs/1703.03400

Article: https://towardsdatascience.com/paper-repro-deep-metalearning-using-maml-and-reptile-fd1df1cc81b0

huseinzol05 commented 5 years ago

I looked into it, still trying to feed into this forecast domain

huseinzol05 commented 5 years ago

It required cost function, learn from gradient descent, metalearning. Evolution strategy is an evolution algorithm, not required to define a cost function. When talking about cost function to minimize, examples from q-learning are not really great in our case.

marvin-hansen commented 5 years ago

Valid point. Meanwhile, I spent more time with gradient-free RL agents, especially neuroevolution and it occurs to me that gradient descent becomes more of a liability in RL and it actually isn't even needed.

Speaking of optimization, have you tested different fitness functions to select the best agent?

The way I read your code is, that the fitness function to score simply states "maximize profit" and especially the neuroevolution agents do just that. However, while screening the trade log, I was wondering if a more nuanced fitness function would stabilize the otherwise volatile trade pattern generated by the best agent?

For example, instead of maximizing profit, one can ask what actually causes consistently profitable trades?

To my knowledge, there are really only very few factors such as:

1) Success-rate: The percentage of trades that make money 2) (Average) winner size: The dollar amount of a winning trade 3) The total trade rate: How often the agent trades with the two parameters above.

You can calculate the expectancy of your market approach by multiplying your success rate by your average winner and then subtracting your failure rate multiplied by your average loser.

(WR Avg(TP)) -(LRAvg(FN)) W*L = 100%

For example, if your method has a 50% success rate and your average winner is two times bigger than your average loser, you have an edge. Here’s why: Average return on a trade = 50% 2R – 50% R 1R - 0.5R = 0.5R

R is the percentage of your capital at risk. If you risk 2% of your capital per trade, then, in this case, your average return per trade or investment is 2%*0.5=1%.

If you take 30 trades in a month, your total return fro that month is expected to be 30%.

That is exactly the reason why HFT on tick resolution is so profitable even if the success is barely 55%.

To train an RL agent to trade consistently profitable, the reward function needs to be different.

1) Be right more often – to have a very high success rate, the agent will need to learn how to choose better setups.

2) The average winner is much bigger than your average loser. This is possible if the agent learn how to construct highly-asymmetric trades, where the upside is open-ended and the downside is limited. Buying calls or puts provides exactly that. Asymmetric trades let you achieve extraordinary returns out of ordinary moves in the underlying stocks while keeping your risk limited.

In practice, that means:

WR : Win Rate in % LR : Loss Rate in % Avg(TP) : Average Winning Trade return in $ Avg(FP) : Average Loss Trade return in $

Edge = (WR Avg(TP)) - (LRAvg(FN))

Reward Function R1 : WR > LR
R2 : Avg(TP) > Avg(FP)
R3 : Edge > 0.5

Total Reward = Max(R1+ R2+R3)

R1 = Be right more often R2 = Average winner is (much) bigger than your average loser. R3 = Don't lose money

Trade frequency is not part of the reward function because you can control frequency by selecting different time frames i.e. daily, hourly, minute, second, or tick level.

Technically, a RL agent optimized for that reward function should outperform conventional agents and algos by a fair margin because the reward function incents a higher frequency construction of high-quality trades with asymmetric risk-reward.

huseinzol05 commented 5 years ago

this is really cool! My crafted reward function is really simple, yep, ROI percentage. we should improve it. I want to try yours. thanks!

marvin-hansen commented 5 years ago

Okay,
please share some benchmark numbers once you tested your agent.

I saw your real-time agent. Cool stuff.

In case you want to automate data loading & adding more features such as technical indicators, I wrote some stock utils that do all that in a lean workflow.

https://github.com/marvin-hansen/StockUtils

Depending on the case, I saw model accuracy improvements of 10% (XGBoost classification) and ~20% for regression models using PyTorch & Fast.ai models. Both when using proc-flow nr. 4 & 5.

https://github.com/marvin-hansen/StockUtils/blob/master/src/procs/ProcFlow.py

In the meantime, I take a closer look at how to make Neuroevolution with Elite selection work on the feature set I get from ProcFlow.

augmen commented 5 years ago

@marvin-hansen are there any resources on how to. Implemet reptile for time series data

marvin-hansen commented 5 years ago

@augmen No.

Try FB Prophet instead.

https://facebook.github.io/prophet/

huseinzol05 / Stock-Prediction-Models

Reptile Optimizer? #18