Closed DaTrader closed 3 years ago
Damir, its a very reasonable question. The intent of ex_money is primarily about guarantees more than absolute performance. The fact that it's decimals under the hood demonstrates that.
On the other hand, the format in the database adds only 3 bytes per column.
There is no doubt that if you have performance critical code and only one currency then storing the amount as an integer will probably be the fastest approach. And then of course you take responsibilities for the guarantees. It might be the right trade off.
That is clear, but Money does keep in memory both the currency atom and the formatting for each of its instances (in addition to the Decimal value) and I am not using a relational DB but am storing the whole data model in a JSON document instead . I had no intention of storing integers only in either memory od DB (although I must say now that it may suffice since both the precision and the format stay fixed over the entire sequence of amounts, so it's not a bad idea either, especially given the scale).
Some quick investigation:
Money.t
will take about 24 words or 192 bytes. An integer will take typically 8 bytes.Decimal
alone will take about 12 words or 96 bytesMoney.Ecto.Composite.Type
will take 30 bytes typically to store. A Postgres integer will take either 4 bytes or 8 bytes.Money.t
A quick test shows:
iex> x = Money.new(:USD, 100)
#Money<:USD, 100>
iex> :erts_debug.size x
24
Meaning that a Money.t
typically takes 24 words, or 192 bytes on a 64-bit machine.
Decimal
iex> :erts_debug.size Decimal.new(100)
12
A Decimal
takes 12 words, about 1/2 of the whole Money.t
struct.
Money.Ecto.Composite.Type
in PostgresIn Postgres, the documentation for the NUMERIC
type says:
The actual storage requirement is two bytes for each group of four decimal digits, plus three to eight bytes overhead.
Using an example database with one row only:
money_dev=# select
pg_size_pretty(sum(pg_column_size(payroll))) as total_size,
pg_size_pretty(avg(pg_column_size(payroll))) as average_size,
sum(pg_column_size(payroll)) * 100.0 / pg_total_relation_size('organizations') as percentage
from organizations;
total_size | average_size | percentage
------------+---------------------------+------------------------
30 bytes | 30.0000000000000000 bytes | 0.09155273437500000000
We can see that it appears that a money composite type takes 30 bytes (variable depending on the amount) to store. A Postgres integer
will take either 4 bytes or 8 bytes depending on type selected. Its 8 bytes for most integers in the BEAM but arbitrary precision integers which overflow the native data type can take a lot more although this is unlikely for money amounts.
Great answer, thanks!
One more thing, not a requirement, just food for thought.
Imagine the use-case I mentioned previously, with huge swaths of amounts all in the same currency and all with the same precision and formatting such as with an accounting software or a financial planning tool. The relevant conclusions that can be drawn from your last comment are as follows:
All the algos would remain virtually the same and the only change to the Money module interface would be accepting integers in addition to Money.t instances (all integers or all Money.t not a mix thereof) and raise an ArgumentError or similar if a context is not mapped to the process in which the integer taking functions get invoked.
Good thought experiments! In writing the library I had the following goals in mind:
You've posed the question: Can the implementation be more time and space efficient. Which I reframe to be: Can the implementation be more time and space efficient and still meet the goals.
This option means the developer takes responsibility for the correctness and uses ex_money
only for formatting. This option is available today by using Money.from_integer/3
.
Given that a Money.t
is 24 words (192 bytes), is there a more memory efficient approach?
{:USD, Decimal.new(100)}
is 15 words{:USD, 100}
is 3 words + 1 word for the integer itself{:USD, 10000, []}
(ie with empty formatting) is 4 words + 1 for the integer
Do these structures provide the same guarantees as using Money.t
? Not quite as rigorous since there is no __struct__
type to match against. On the other hand we can still validate the currency code and the integer in guards so the guarantees might be enough.
One interesting option might be to encode both the currency code and the amount into a single integer. ISO 4217 defines a 3-digit numeric code for currencies so would need 10 bits to store it. Then there are 54 bits left to store the currency amount. It would still need to be a signed integer in order to be a complete replacement for the Money.t
struct.
An example would be <<978::10, 1000::signed-integer-54>>
where here 978
is the ISO 4217 numeric code for EUR
and 1000
is the amount. The bitstring is interpreted as EUR 10.00
.
In this format, with 54 bits to work with as a signed integer, we can store +/- 9007199254740992
which, I suspect, will cater for most use cases.
Here we can still validate the currency code, the amount is an integer so we can still interpret that correctly in the context of the currency.
This format would have some issues as a serialisation format since math operations in the database would not return correct results. But serialisation could be done as a composite type of two integers: currency numeric code (small int) and amount (large int).
It appears there are at least 2 different representations that can be much more space efficient: {:USD, 1000}
and encoded integer.
The first, {:USD, 1000}
delivers almost the same level of guarantees as the Money.t
and at 4 words not 24 words its 6 times less memory.
The second, an encoded integer, is 1 word instead of 24 and is therefore the most space efficient. But not all guarantees can be met.
These still don't answer some open questions:
Thoughts welcome. I'll definitely do some experimentation and see what might be possible, practical and sustainable.
Note that in any implementation using integers, precision has to be fixed so all integers can be interpreted correctly. We could also encode a precision in the integer but thats likely too complex and the law of diminishing returns probably applies.
The practical implication is that for sum and subtraction there should be no issue. For multiplication I think it's still ok. Division is most definitely a problem - or at least would be incompatible with the current ex_money
implementation. The current implementation defers rounding to the currencies digits to the very last possible moment so precision can be preserved. Thats not going to be possible in an integer implementation. My understanding is that financial institutions expect to retain at least 7 decimal digits of precision and I don't believe that can be maintained any of these proposals.
If the experiments prove positive, I'll probably implement them as a new but complementary library. Which, assuming the benchmarks prove out, I'll call "ex_fast_money". This will make clear the guarantees are different.
Following on from the previous Option 3, we could encode the precision in 3 bits allowing for 8 digits of precision. Also noting that small integers on the BEAM, for 64-bit systems, is actually 60 bits since 4 are kept for type information.
<<978::10, 3::3, 1000::signed-integer-47>>
Would mean EUR 1.000
where the precision is set by the 3::3
. That way we have useful arbitrary precision and still be able to do fast math (at least sum, subtract and multiply - division I'm still looking at)
I implemented two experimental versions of Decimal
on the weekend:
Neither of these implementations is ready for production use at all.
Then I ran some very basic benchmarking. TLDR; Packed decimals are over 30% more space efficient but also nearly 75% slower. The primary slowdown is the packing of the decimal into an integer.
The memory analysis below does not account for primitive (ie native integer) space. For a Decimal
thats a further 3 words (or 12 bytes) and for PackedDecimal
its a further 1 word (4 bytes). So the memory difference is slightly more in favour of PackedDecimal
than shown below.
However, this data does not, in my mind, create a compelling reason to consider alternative implementations for Money
given that the :amount
is the primary contributor to space and time. Nevertheless, suggestions and comments welcome!
Name ips average deviation median 99th %
Decimal 6.33 M 157.85 ns ±30542.90% 0 ns 1000 ns
Bitstring Decimal 4.00 M 250.19 ns ±17352.21% 0 ns 1000 ns
Packed Decimal 3.62 M 276.03 ns ±25034.99% 0 ns 1000 ns
Comparison:
Decimal 6.33 M
Bitstring Decimal 4.00 M - 1.58x slower +92.34 ns
Packed Decimal 3.62 M - 1.75x slower +118.18 ns
Memory usage statistics:
Name Memory usage
Decimal 96 B
Bitstring Decimal 128 B - 1.33x memory usage +32 B
Packed Decimal 64 B - 0.67x memory usage -32 B
The results above bugged me, I didn't think the results should vary so much. Its challenging with fast loops of course, and the median of 0 ns above with such big deviation illustrates that. I decided to try running the benchmark over a longer period of time so see if the results stabilise. And being on OTP 24 with the JIT to see if that produces a change over the longer time too.
It does look like all three implementations converge quite closely when run longer, at least when calling new/1
in which the packing of either the integer or the bitstring is the dominant performance point. Memory utilisation remains lowest on the packed decimal implementation as expected.
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 min
memory time: 4 s
parallel: 1
inputs: none specified
Estimated total run time: 3.30 min
Benchmarking Bitstring Decimal...
Benchmarking Decimal...
Benchmarking Packed Decimal...
Name ips average deviation median 99th %
Decimal 4.45 M 224.84 ns ±45237.16% 0 ns 1000 ns
Bitstring Decimal 4.13 M 242.28 ns ±40507.71% 0 ns 1000 ns
Packed Decimal 4.02 M 248.55 ns ±21506.50% 0 ns 1000 ns
Comparison:
Decimal 4.45 M
Bitstring Decimal 4.13 M - 1.08x slower +17.44 ns
Packed Decimal 4.02 M - 1.11x slower +23.70 ns
Memory usage statistics:
Name Memory usage
Decimal 96 B
Bitstring Decimal 128 B - 1.33x memory usage +32 B
Packed Decimal 64 B - 0.67x memory usage -32 B
Very exhaustive, I may say. Much more than I expected.
One thing puzzles me, though. Why are you disregarding the possibility of segregating the formatting and precision data from the amounts altogether, e.g., as I mentioned previously, in a separate singleton-per-data-model, context that is required to be mapped to the process prior to executing the computations in question?
For as long as the specification requires it, all of your original goals are still achieved:
It's just that in this case the integral atomic piece of information is no longer the amount, but the model containing many amounts and the contextual (model-wide) formatting and precision.
Btw, there is no general industry-wide rule on the desired level of precision in financial calculus. There may be some "best" or most common practices but one is discouraged from relying on those and in favor of what's actually being stipulated in each particular case. Each Credit Agreement (e.g. a Syndicated Loan Agreement) for instance, explicitly stipulates an exact decimal precision (no more no less) to be used for each particular type of calculation in order for all the parties to come up with the same results.
Very helpful, thank you. I googled as much as I could to identify and standards or practises related to precision when I was writing the lib and couldn't find any so your knowledgeable feedback is very helpful.
Why are you disregarding the possibility of segregating the formatting and precision data from the amounts altogether
Not disregarding, just wanted to see what might be possible in a more compact representation of the current implementation in order to see where the boundaries are for improvement. Just experimentation.
In part because at its essence, a decimal is mostly what you suggest: an amount disconnected from the currency and formatting. The formatting field in Money.t
"costs" only one word for the field name itself if no formatting data is provided since an empty list []
occupying no space.
I suppose I have some hesitancy in having amounts interpreted as money but in order to interpret that amount correctly some additional external state is required. Worth experimentation for sure but it does feel uncomfortable.
In the scenario you describe, how would you think the following should be handled in this case:
GenServer
state, or send it a message to another process.Yes, but as previously pointed out, it was you who brought the Decimal.t "heavyweight" nature to my attention, so I figured while at optimizing why not go all the way.
As for the answers to your questions:
Hi Kip,
Has it been thought through on how to benefit from the Money library in a use case where it's absolutely redundant to use Money instances even in memory-only given the fact that each single instance out of potentially tens of thousands or more instances (of money) that are present in the data model are of the same currency and identical formatting?
As I haven't yet had the time to delve deeper into the Money source code, do you find it feasible performance-wise to store (memory and DB -wise alike) the Decimals, while creating Money instances on demand for computation purposes only, or do you find the Money library as such too heavyweight in general for one such use-case?
Thanks,
Damir