Improve Decimal performances

mdeverdelhan commented 9 years ago

Following #19 I would like to improve the performance of the Decimal class.

I think the internal BigDecimal could be replaced by a less memory-and-cpu-expensive thing. It can be achieved by reducing the abilities of these decimal numbers. Thus:

The maximum scale can be 16.
The maximum precision can be 32.
For now we don't need more math operations than the Decimal class is already providing.

davassi commented 8 years ago

What about JDECIMAL: https://github.com/tools4j/decimal4j ? https://github.com/tools4j/decimal4j/wiki/Performance

mdeverdelhan commented 8 years ago

Thank you! \o/ I didn't know that projet. I will look deeper into it in the next days but it seems promising. It would make me happy if we could drop our Decimal class to use this one instead.

davassi commented 8 years ago

You are very welcome!

Actually I was profiling the performance of your library, and crawling into the code, I noticed this expensive operation:

private Decimal(BigDecimal val) {
        this(val.toString());
}

that calls directly such constructor:

private Decimal(String val) { delegate = new BigDecimal(val, MATH_CONTEXT); } Creating a BigDecimal from a String it's very expensive. With a profiler I noticed that 75% of computational time is spent creating Decimals. Since the Decimal object is a wrapper around the BigDecimal delegate, wouldn't be better to rewrite such constructor as:


private Decimal(BigDecimal val) {
        delegate = val;
}

?

mdeverdelhan commented 8 years ago

You are right, except that with your rewriting the math context is not sure. Anyway looking into decimal4j is in my top priorities now. However the potential replacement may need some time.

davassi commented 8 years ago

This is great, thanks a lot for you efforts in such direction.

I can tell you that after some benchmarking, if you can ensure the math context (that seems to me to be always MathContext(32, RoundingMode.HALF_UP)), the execution of the strategy is more than 3 times faster and the memory consumption is half. Please consider such patch besides what we are going to use as delegate, due to the fact that ANY arithmetical operation on Decimal is going to invoke such constructor.

mdeverdelhan commented 8 years ago

I made your small fix. It will be deployed in the next hours. You were right: it hugely decreased the execution time of my benchmarking codes. Thank you very much.

davassi commented 8 years ago

I've found another library that worth a look: http://www.apfloat.org/apfloat_java/ I will create another Decimal implementation with this as delegate, in order to see how it behaves.

Thanks for your efforts, I will make a donation :)

cu39 commented 8 years ago

I haven't investigated performance so close as both of you but enjoyed reading @mikvor's posts to learn the basics: http://java-performance.info/bigdecimal-vs-double-in-financial-calculations/ http://java-performance.info/high-performance-money-class/ https://github.com/mikvor/money-conversion

mdeverdelhan commented 8 years ago

@davassi

I already did some benchmarking using Apfloat (see #19). I had not been convinced but it was a long time ago now. I'm interested in your results.

The problem is often the same: decimal libraries are designed to manage infinite numbers (or numbers with a very-very-very large number of digits). It's not relevant here since we are not doing astronomy computations but only (mostly) monetary ones. However we need enough digits. By rule of thumb I estimate that a comfortable number of digits would be 32.

Actually I was more thinking about straight replacing our Decimal class by the decimal4j's one (and not only the delegate). I say decimal4j because it comes with approximately the same methods I added in Decimal. We should do advanced tests for this library because it's backed by a long value. Even if it will work fine in most of the cases with fiat currencies, problems appear when you deal with cryptocurrencies. For instance: Bitcoin has an 8-decimal precision. Then we could imagine to manipulate values like 99,999,999.99999999 (order of magnitude); and we are not so far from 9,223,372,036,854,775,807 (max value of the long primitive type). It's sinking in; but what if you combine your values with other indicators? Aren't we too close of the risk of overflow?

PS: For the donation you would be the first in more than 2 years, thank you.

@cu39

Thank you. Questions around BigDecimal vs primitive types are always interesting. The problem with money libraries is that they offer to manipulate "currencied" value. I want ta4j to be currency-agnostic. Also most of the indicator we are providing don't return monetary values.

davassi commented 8 years ago

@mdeverdelhan

I did some tests and backtests with Apfloat and, regardless they ensure any arbitrary precision, they are even slower then BigDecimal. Good news instead with Decimal4j. I've used as delegate the Decimal8f implementation: it is quite faster and the precision (scale 8) is very good.

Replying to your question: as you've defined it, it's a rule of thumb. In my opinion there's no need to ensure 32 or arbitrary precision digit in the domain of price and averages calculations. Decimal8f represents an immutable decimal number with a fixed number of 8 digits to the right of the decimal point. I'd rather say that 8 decimal precision is more than enough and in a long it is quite far to reach binary overflow.

I will provide a pull request asap.

mdeverdelhan commented 8 years ago

I don't agree. Decimal8f is not enough. E.g.: A bitcoin can be divided down to 8 decimal places. Therefore, 0.00000001 BTC is the smallest amount that can be handled. Let's assume the following time series: BTC close price = 0.00000002, 0.00000001, 0.00000002

With the current BigDecimal delegate: new SMAIndicator(series, 3).getValue(2) will return 0.000000045
With a Decimal8f delegate: new SMAIndicator(series, 3).getValue(2) will return 0.00000004 (or 0.00000005, according to your rounding mode)

I think that, for averages, 2 more digits would be sufficient. But we have to provide a bigger margin for non-trivial indicators. Maybe Decimal12f would be enough but:

it would not be "comfortable"
the max value we could manage would be 9,223,372.036854775807 (which gives a too small number of digits to the left of the decimal point)

That's why I think an ideal solution would provide <16-digits>.<16-digits> and why I did not rush on Decimal4j (yet).

davassi commented 8 years ago

@mdeverdelhan I understand your point. But I have to say that pragmatically most of the trading pairs are made on top of real (2 digits) currencies. Notwithstanding, if you trade Bitcoin as primary currency on Kraken.com you'll find a maximum of 6 digits (i.e. "0.026750") as price (you can easily check those on https://www.kraken.com/charts) and on poloniex.com actually you are gonna find 8 digits.

Maybe a Decimal10f could be a good trade off?

mdeverdelhan commented 8 years ago

Even with Decimal10f your max value will be 922,337,203.6854775807. It will become short if you are manipulating low value currencies. E.g.:

CFA Franc (Western Africa): 1 USD ~= 600 XAF (max value ~= 1,537,228.67 USD)
Won (South Korea): 1 USD ~= 1100 KRW (max value ~= 838,488.37 USD)

If the value you handle is a market cap or a volume, you have a big risk of overflow.

And I don't even speak about currencies like:

Dong (Vietnam): 1 USD ~= 22000 VND
Rial (Iran): 1 USD ~= 30000 IRR (this one is trendy nowadays)

I'm still weighing up the pros and cons.

somid3 commented 8 years ago

Given that this library can be used for a number of applications it requires a bit more flexibility. For one, I wouldn't make Decimal a final class. Others might want to add new features in a subclass. As to the precision, it would be great if there can be a configuration file and have the default be 32, but allow developers to lower/increase that value.

mdeverdelhan commented 8 years ago

A non-final Decimal class might lead to unexpected behavior. BigDecimal is also final and I never encountered anyone that found it a problem. If you want a new function, make a PR.

About the precision: So many things should be put in a configuration file. Except if you find that it would strongly change performance in the underlying BigDecimal case I won't consider the idea. It saves us from another property file.

mdeverdelhan commented 7 years ago

Please continue this thread on the new repo: https://github.com/ta4j/ta4j

mdeverdelhan / ta4j-origins

Improve Decimal performances #28