uniVocity / univocity-trader

open-source trading framework for java, supports backtesting and live trading with exchanges
577 stars 140 forks source link

Migrate every single indicator in ta4J lib to this project #7

Open jbax opened 4 years ago

jbax commented 4 years ago

There are plenty of indicators in ta4j library, with unit tests and everything, but it's not a library that can be used with this framework. The code in there needs to be migrated so it can work here.

Note that every effort is made here to prevent maintaining a list of candles in memory. New indicators must extend from one of the abstract classes in package com.univocity.trader.indicators.base.

Nothing prevents one to create their own Indicator implementation, but in such case care must be taken to properly initialize the Aggregator that is responsible for aggregating candles (for example, converting five 1 minute candles into a single 5 minute candle)

nimo23 commented 3 years ago

but it's not a library that can be used with this framework

@jbax Why? Can you explain why you cannot use ta4j in this project as it is?

What are the drawbacks/limitations/missings of ta4j in compare to your rewrite?

Maybe we can improve ta4j codebase so you dont need to rewrite it again.

jbax commented 3 years ago

Mainly because I wanted to easily trade with indicators on multiple time frames (e. g. Combine signals on 5m, 17m, 44m and 3h, then buy/sell).

Also to run brute force parameter optimization processes: give it 50 trading pairs, 5000 parameters for an indicator in a strategy, then find the parameters that trade best overall on all 50 pairs.

Ta4j proved to be too slow for what I'm doing. I built this from the ground up to be as fast as possible. Using 1 minute candles from the first moment the symbol showed up on binance until today, I can run the process described above in 20 minutes on a threadripper 3990x (we're talking about billions of candles processes over and over) .

If I were to suggest improvements to ta4j I'll start by rewriting a lot of stuff, starting from getting rid of that Num wrapper and use double instead, at least for indicators.

Also we can't keep candles in memory and perform all calculations "running". There's no concept here of keeping BarSeries and indicators that you get stuff from to then walk back (e.g. BarSeries.getBar(index) and Indicator.getValue(index)) as this prevents efficient parallelism.

There's no refactoring in the world here that wouldn't end up in me rewriting a lot of stuff anyway, plus I'd be arguing with them a lot (wasting time I don't have) because I'm always going to opt for maximum performance over clean code no matter how complex it ends up getting (parameter optimization depends on speed). If you look at the internals of the code here you'll notice it's not always pretty. The user facing code is as clean as it can get though.

Hope this explains why this project was created. I'm super grateful to have ta4j around to get ideas and unit tests from, and I'm happy to report bugs when I find them. There are indicators here that don't exist there. Maybe someone can migrate these to ta4j as well so we can have one project getting code and ideas from each other.

On Sun, Sep 6, 2020, 4:16 AM nimo23 notifications@github.com wrote:

but it's not a library that can be used with this framework

@jbax https://github.com/jbax Why? Can you explain why you cannot use ta4j in this project as it is?

What are the drawbacks/limitations/missings of ta4j in compare to your rewrite?

Maybe we can improve ta4j codebase so you dont need to rewrite it again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/uniVocity/univocity-trader/issues/7#issuecomment-687647467, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWFQPVK4GDUPTPJ5V3J7ULSEKBRVANCNFSM4JQYW7CQ .

nimo23 commented 3 years ago

I will try to debate the gaps. I think @team172011 can also bring up some points:

Mainly because I wanted to easily trade with indicators on multiple time frames (e. g. Combine signals on 5m, 17m, 44m and 3h, then buy/sell).

In ta4j, you can also merge indicators coming from different timeframes by combining them to rules. Also, ta4j has a BarAggregator to aggregate bars from, e.g. 1 min to 7 min.

Also to run brute force parameter optimization processes: give it 50 trading pairs, 5000 parameters for an indicator in a strategy, then find the parameters that trade best overall on all 50 pairs...Ta4j proved to be too slow for what I'm doing.

This is something we can investigate. In ta4j, you can select either double or BigDecimal for indicator calculations. We also have some basic backtesting classes. When comparing the speed, did you use the Num type of double?

If I were to suggest improvements to ta4j I'll start by rewriting a lot of stuff, starting from getting rid of that Num wrapper and use double instead, at least for indicators.

The Num-wrapper should be not the bottleneck when you use Num type of double.

Also we can't keep candles in memory and perform all calculations "running". There's no concept here of keeping BarSeries and indicators that you get stuff from to then walk back (e.g. BarSeries.getBar(index) and Indicator.getValue(index)) as this prevents efficient parallelism.

BarSeries can be used for multiple indicators at the same time. Also, it does not need to be rebuild on every indicator calculation. Maybe I did not get your point..

Maybe someone can migrate these to ta4j as well so we can have one project getting code and ideas from each other.

In my opinion, we can try to optimize ta4j in the way so it deserves also your needs. We/you can open a few issues in ta4j for that and I think in the long term, you will also benefit by not rewriting every indicator again. Wdyt, @jbax, @team172011 ?

jbax commented 3 years ago

I guess the first approach to performance you could take is to measure it. The Num wrapper is just extra overhead. It's not a bottleneck per se but it affects overall performance, and it's the first easy target for improving speed a bit.

The BarSeries issue is that it is a list in memory. Indicators also have their values stored in memory. That's the issue I was referring to. This prevents easily processing many symbols at the same time, with lots of history, in parallel and with low memory usage.

Here all indicators have their current state only. There's no notion of a previous indicator's value. If any previous value is needed this is stored then discarded as needed (usually through a data structure I created named CircularList)

I appreciate that you guys want to improve ta4j. It's a great project but for our use case I think it's a bit late to consider ta4j now. Almost all indicators have been rewritten already.

I hope you guys find something here useful too to help improving your lib.

Cheers!

On Sun, Sep 6, 2020, 6:44 AM nimo23 notifications@github.com wrote:

I will try to debate the gaps. I think @team172011 https://github.com/team172011 can also bring up some points:

Mainly because I wanted to easily trade with indicators on multiple time frames (e. g. Combine signals on 5m, 17m, 44m and 3h, then buy/sell).

In ta4j, you can also merge indicators coming from different timeframes by combining them to rules. Also, ta4j has a BarAggregator to aggregate bars from, e.g. 1 min to 7 min.

Also to run brute force parameter optimization processes: give it 50 trading pairs, 5000 parameters for an indicator in a strategy, then find the parameters that trade best overall on all 50 pairs...Ta4j proved to be too slow for what I'm doing.

This is something we can investigate. In ta4j, you can select either double or BigDecimal for indicator calculations. We also have some basic backtesting classes. When comparing the speed, did you use the Num type of double?

If I were to suggest improvements to ta4j I'll start by rewriting a lot of stuff, starting from getting rid of that Num wrapper and use double instead, at least for indicators.

The Num-wrapper should be not the bottleneck when you use Num type of double.

Also we can't keep candles in memory and perform all calculations "running". There's no concept here of keeping BarSeries and indicators that you get stuff from to then walk back (e.g. BarSeries.getBar(index) and Indicator.getValue(index)) as this prevents efficient parallelism.

BarSeries can be used for multiple indicators at the same time. Also, it does not need to be rebuild on every indicator calculation. Maybe I did not get your point..

Maybe someone can migrate these to ta4j as well so we can have one project getting code and ideas from each other.

In my opinion, we can try to optimize ta4j in the way so it deserves also your needs. We/you can open a few issues in ta4j for that and I think in the long term, you will also benefit by not rewriting every indicator again. Wdyt, @jbax https://github.com/jbax, @team172011 https://github.com/team172011 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/uniVocity/univocity-trader/issues/7#issuecomment-687664093, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWFQPXKVS2DWHD4AQU2JTDSEKS5DANCNFSM4JQYW7CQ .

team172011 commented 3 years ago

Since @nimo23 mentioned me I will add also some points.

If there is something we can do to make ta4j more convenient for users or other projects I am always open for discussion and improvement. But I also understand that there are unsuitable demands sometimes.

I guess the first approach to performance you could take is to measure it. The Num wrapper is just extra overhead. It's not a bottleneck per se but it affects overall performance, and it's the first easy target for improving speed a bit.

For sure high performance is the bottleneck in ta4j because the Num (former Decimal) class generates a lot of overhead compared to using only primitives. You always have to create at least one new wrapper class in each calculation..

Here all indicators have their current state only. There's no notion of a previous indicator's value. If any previous value is needed this is stored then discarded as needed (usually through a data structure I created named CircularList)

In ta4j the calculation of an indicator value get also only triggered if you request the value and will be stored in the cache after the first call. Currently I am thinking a lot about (and @nimo23 also made proposals) on how to improve ta4js caching layer. Adding possibilities of no-caching, conditional caching and similar would be nice for ta4j as well!

The BarSeries issue is that it is a list in memory. Indicators also have their values stored in memory. That's the issue I was referring to. This prevents easily processing many symbols at the same time, with lots of history, in parallel and with low memory usage.

The BaseBarSeries implements it that way. Since ta4j is working with the BarSeries interface, one could also imaging to write a BarSeries implementation that is file based or something similar like you need.