jameschch / LeanOptimization

Genetic optimization using LEAN
Apache License 2.0
50 stars 14 forks source link

Improvements. #32

Open IlshatGaripov opened 5 years ago

IlshatGaripov commented 5 years ago

Hello @jameschch thanks for a great platform you have created. As I wrote you I have taken your existing project as a base, and elaborated it little further. And here are the thigns I think could be imrpoved ( I am giving just a brief overview sorry if do not make every point as clear as it could be):

1) One of the most important things is walking forward optimiation. That has to be intuitively clear and easy to set up. There are not too much information on a subject on the internet but these two fine articles I have found that I would like to share with you: https://profsoftware.com/bt/wfo.htm http://www.codefortraders.com/Walk-Forward_Analysis/BTWFMgr.htm these are the documentation for two ready WFO implementations I tried to implement it similiar to what they have got: https://github.com/IlshatGaripov/OptimizationLeanJames/blob/master/Optimization/OptimizerManager/WalkForwardOptimizationManager.cs

2) Cloud computing. To solve complex optimization problem I could see no better solution that to enable some sort of cloud computing. Depending of conditions of WFO there is a need sometimes to do thousands of backtests per an experiement. So it lead to implementation of : https://github.com/IlshatGaripov/OptimizationLeanJames/blob/master/Optimization/Common/AzureBatchManager.cs The module does enable to expand computing power in a cloud - create compute nodes - download all the historical data to the storages and then to every node - do the backtest save the result to storage than translate it to the local PC and so on. This works in union with : https://github.com/IlshatGaripov/OptimizationLeanJames/blob/master/Optimization/GeneticSharpExtensions/TaskExecutorAzure.cs The drawback it takes a time to create the nodes - aroudn 20 min - and to uplaod the data to every node. Probably more convenient to have a multiprocessor architecture at home - 100 or more PCs ;)

3) Genetic optimization algorithm itself and oprationts that it performs were overwritten at many parts and results ot this work were even highlighted in a separate folder https://github.com/IlshatGaripov/OptimizationLeanJames/tree/master/Optimization/GeneticSharpExtensions which makes me think that genetic optimizatino SDK could be in whole overwridden for the purposes of optimizing trading strategies. Because most of mutation selection operation etc that GeneticSharp has available are not evry applicable for the trading task and are more suitable for many classic academic optimization problems - like shift mutation or some binary mutation and so on are not applicable here at all.. on the other hand, other sort of mutations must be implemented and present.. by the way it feels like GA is a powerful solution for trading params search optimization problems - it requires more thoughtful study - so it ended up I used only the few files frmo GeneticSharp - rewrote all the rest.

4) All https://github.com/jameschch/LeanOptimization/tree/master/Optimization/Fitness classes I have joinged in a single functionality inside: https://github.com/IlshatGaripov/OptimizationLeanJames/blob/master/Optimization/Common/StatisticsAdapter.cs The static method CalculateFitness does all the job depending on fitnessFunction variable set by config file. But this is a minor thing..

jameschch commented 5 years ago

These are all valuable suggestions. Sorry if I've not picked up exactly what you are trying to put across. Some of these issues are quite subtle so feel free to clarify where necessary.

  1. A full walk forward optimizer it going to be my top priority. I did already put together a very simple measure that works on a similar time-step pattern:

NestedCrossSharpeMaximizer

This is based on my integration with the optimizers from SharpLearning, which extends the capabilities beyond Genetic to Bayesian, Particle Swarm, Random Search, Grid Search, Smac and a few others.

Regardless: a correct walk forward will be next focus.

  1. Yes your work integrating with Azure is impressive. I had not considered the compute as a service angle, where I needed a remote host I just deployed and ran the optimizer process. To do this properly, you would need a resource pool rather than waiting for spin-up. I have been looking into this for AWS and expect Azure would not be very different.

  2. I think you're saying the repo can be restructured into folders. I completely agree and have finished a restructure a little while ago. In conceptual terms, I agree the genetic optimization has limited utility which is why I have integrated with the SharpLearning optimizers. The basic GA pattern does remain quite convenient even for the non-genetic optimizers.

  3. There might be a little consolidation possible with the fitness classes.

IlshatGaripov commented 5 years ago

James,

  1. I have not peered in detail at your optimizers from SharpLearning, have been playing around with genetic optimization only - but they look promising! And this should be something I need to work with in the next experiments.

  2. What concerned to Azure.. You say: To do this properly, you would need a resource pool rather than waiting for spin-up. The way the things are designed it actually does create a pool at a first start. And then - if pool is existent - reuses it in the consecutive experiements with no time overhead required to create pool again. Upon completion of the calculations, the program will ask whether to delete the resource pool - we can choose no - if want to reuse the pool again. But after the last experiment we still have to remove it to optimize the costs as Microsoft charges for the fact the nodes are active and deployed not for the actual usage - no matter whether or not nodes are currently doing any calcus. it’s certainly not the same as deploying resources exactly before each experiment but still there is a need to wait for the pool to be created and data be uploaded to the nodes - can take up to 40 min - once before the experiment session - which I am not sure how to optimize rather then supporting own multiprocessor architecture at home/office.

  3. What I meant by that is that Genetic Sharp lib source files in my opinion had to be imporved in many parts to be better suited to solve the particular optimization problems and sometimes it it was not possible to do due to various library restrictions such as library interfaces, protected access modifier etc. without changing the SDK source itself.. So I decided to support GeneticSharp clone inside the project.. Here how it looks renewed : https://github.com/IlshatGaripov/OptimizationLeanJames/tree/master/Optimization.Genetic And within the bounds of inner project I am free to maky any modifications that are felt useful..

jameschch commented 4 years ago

You expressed interest before in Walk-forward optimization. Thanks to your encouragement, I have just merged my effort at this into the master branch of my .net core port of the Lean Optimizer available here:

https://github.com/jameschch/LeanParameterOptimization

I'm going to be shifting my focus over to developing this other codebase. I already have a great deal more features including some Blazor interfaces etc. Next step will probably be to follow your suggestion of stripping the genetic patterns into a separate channel and focusing more on the other optimizers such as Random, Grid, Bayes etc.

The whole cloud scale out problem is something I've given a lot of thought but have not yet had the nerve to tackle.

IlshatGaripov commented 4 years ago

Hi @jameschch that sounds interesting I would be definitely interested to study the new implementatio, as for the cloud computing, I managed to do everything smoothly enough and by the way the problem I wrote about with long deployment has been also resolved - it takes now 5 min to deploy all the computing powers in Azure to start the experiments.. probably this could be re-used and integrated in new version.. it's been busy time for me lately with new project sorry but I would be glad to follow the progress. Thanks!

jameschch commented 4 years ago

I've been reflecting on the current version of the walk forward and think it is suitable for walk-forward validation but needs a multi-level cost function over the cost of each out sample fold and the variance between the optimal parameters for all folds. I have been looking closely at this very promising approach to multi-level cost functions: https://arxiv.org/pdf/1808.06846. However, I suspect this goes too far and maybe all that's needed here is a simple weighted cost function. Maybe you could take a look and tell me what you think?

The next thing to move onto is going to be seamless cloud integration I think.

IlshatGaripov commented 4 years ago

Hi @jameschch sorry it's been a busy time for me recently I have taken a look and I think this is probably an interesting approach if we take cost functions that based on different factors and which can vary greatly in time from each otherr - this could help to find optimum that will be potentially good (stable) in terms of all factors that have been taken into account. that's interesting. as for the search for a global maximum, maybe any of the heuristics from this book could come in help https://cs.gmu.edu/~sean/book/metaheuristics/Essentials.pdf I have read in long time ago - and only few parts what are the genetic algorithms related - and one of the approaches that one can use I believe that is suggested in this book is to add at each step a sufficient number of new random points to search for new extremums that will help to prevent stagnation over the local min/max.