Closed mmmries closed 1 year ago
This indeed does look quite concerning. There's nothing that immediately jumps to my mind that could cause this. It would be very useful, if I had a chance to reproduce.
One thing to check potentially could be the size of the intermediate iodata that we encode into, before we convert into the final string. You could get it with something like Jason.encode_to_iodata!(data) |> :erts_debug.flat_size
β€οΈ π π π π Thanks @michalmuskala I had not thought of checking the iodata size, but I'll give that a try. I'm not sure if ecto uses the iodata representation or if it uses the binary representation under the hood, so I'll try to check that as well.
I'll my findings and hopefully a more reproducible example soon
I'm using the following script to investigate this memory usage:
# set process memory limit
max_heap_words = div(floor(2.0 * 1024 * 1024 * 1024), :erlang.system_info(:wordsize))
Process.flag(:max_heap_size, max_heap_words)
# load trace, trace.bin was created by doing `File.write("/tmp/trace.bin", :erlang.term_to_binary(trace))` on a production server
# it is not included here because it contains some sensitive data
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem initial (mb)")
trace = File.read!("tmp/trace.bin") |> :erlang.binary_to_term()
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem after loading trace (mb)")
# check size of data
div(trace |> :erts_debug.flat_size(), 1024 * 1024) |> IO.inspect(label: "trace flat_size (mb)")
div(trace |> Jason.encode_to_iodata!() |> :erts_debug.flat_size(), 1024 * 1024) |> IO.inspect(label: "encode_to_io_data flat_size (mb)")
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem after encode_to_iodata! (mb)")
# try to encode to binary
div(trace |> :erts_debug.flat_size(), 1024 * 1024) |> IO.inspect(label: "trace flat_size (mb)")
div(trace |> Jason.encode!() |> byte_size(), 1024 * 1024) |> IO.inspect(label: "encode! byte_size (mb)")
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem after encode! (mb)")
If I set a memory limit of 2.5GB this script completes successfully with this output:
mem initial (mb): 0
mem after loading trace (mb): 244
trace flat_size (mb): 25
encode_to_io_data flat_size (mb): 61
mem after encode_to_iodata! (mb): 785
trace flat_size (mb): 25
encode! byte_size (mb): 46
mem after encode! (mb): 825
If I run this script with a max heap size of 2.0GB I get this error:
mem initial (mb): 0
mem after loading trace (mb): 244
trace flat_size (mb): 25
encode_to_io_data flat_size (mb): 61
mem after encode_to_iodata! (mb): 785
trace flat_size (mb): 25
** (EXIT from #PID<0.93.0>) killed
So encode_to_iodata
produces data with a flat_size
of 61MB, but it has grown the memory usage up over 700MB while producing that data. And doing the encode!
to binary uses more than that and ends up going over the 2GB limit for total heap size.
I shared the file with you through another channel, but please don't feel obligated to look into this. I appreciate the work you do for the community and I'm posting here mostly to share my experience with the community. This library is so useful and I'll continue to work on creating a reproducible that does not involve any sensitive data so I can post here as well.
I came up with a reproducible that I think shows the same sort of memory spike as what I'm seeing my production use case.
https://github.com/mmmries/jason_memory_repro
The main idea is that you have some big structs with a lot of data in them, but their Jason.Encoder
protocol implementation only uses a very small amount of that data. Then you create a parent structure that nests many copies of the struct. You can play around with the size of the trace data and the maximum heap space, but I left it in a state where the final jason binary would be ~200MB, but we end up going over 650MB of heap space used while encoding it to a binary. That's not as much memory bloat as I'm seeing production, but it's at least in the ballpark.
I did a quick test to see how much memory usage we are using in the current jason encoding benchmarks. I ran the bench/encode.exs
file and commented out all of the other jason encoders to just get an idea of how much memory it takes to do an encode_to_iodata!
and encode!
function calls and I compared that to the byte size of the json fixtures
dataset | json size (kb) | encode_to_iodata (kb) | Multiple |
---|---|---|---|
blockchain | 18.0 | 79.0 | 4.4 |
canada | 2150.4 | 84602.9 | 39.3 |
giphy | 121.0 | 861.6 | 7.1 |
github | 54.0 | 221.2 | 4.1 |
govtrack | 3788.8 | 19415.0 | 5.1 |
issue-90 | 7884.8 | 833.1 | 0.1 |
json-generator | 108.0 | 778.3 | 7.2 |
pokedex | 55.0 | 797.3 | 14.5 |
utf-8-unescaped | 14.0 | 6.1 | 0.4 |
I re-ran these tests using
Jason.encode!
and got the same memory usage across the board. So at least for these cases where we parse from JSON, it appears thatencode_to_iodata
has the same memory characteristics asencode!
My main conclusion looking at it this data is that the encoding overhead is not a fixed in size and also not proportional to the json byte size. Given the wide variance between 0.1
and 39.3
factors it seems that there are some cases we encoding using a lot of extra memory and others where it has very little overhead.
I'm going to keep diving deeper on the cases where there is a lot of overhead and compare those to my reproducible case to better understand where this overhead comes from
For reference I also ran the current encoder benchmarks with Poison and JSX and this library uses the least amount across the board (I'm showing the canada dataset as a single comparison point). So this affects all of the pure-elixir implementations
I got a little further on this today and found that one of the big things that contributes to memory usage are scalar values that need to be "stringified". The "canada" data was an obvious outlier in terms of memory usage and it turned out that the this JSON is mostly a bunch of floating point values (a big list of floating point pairs, I think representing lat/long coordinates?)
When we run encode_to_iodata!
for floating point values we end up calling :io_lib_format.fwrite_g(float)
which I couldn't find in the erlang documentation, so I'm guessing it's an un-documented part of the standard library that is optimized for formatting floats as a string?
Just to test this hypothesis I changed a fork of the json library to just not encode floats (this is obviously not a workable solution, but I'm just checking myself) and now running an encode on the canada dataset takes 5.10MB
of memory compared to the current 82.62MB
of memory.
Is there anything special about :io_lib_format.fwrite
? Could we potentially export those as ["123", ".", "456"]
? to avoid allocating bigger strings?
Here are a few other alternates I tried. I think it's pretty interesting that float_to_binary(float, decimal: 20)
results in 2.74x speed increase as well as a 8.7x memory reduction.
function | memory usage | ips | what it breaks |
---|---|---|---|
:io_lib_format.fwrite |
82.62 | 5.26 | this is the current implementation |
IO.iodata_to_binary(:io_lib_format.fwrite_g(float)) |
86.74 | 6.68 | |
:erlang.float_to_binary(float) |
10.19 | 14.02 | Adds a scientific notation that is not supported by JSON |
:erlang.float_to_binary(float, decimals: 20) |
9.46 | 14.43 | Always outputs 20 decimals of precision so we get 99.98999999999999488409 instead of 99.99 |
:erlang.float_to_list(float, decimals: 20) |
45.17 | 8.87 | Same as ^ |
I ran all of these benchmarks with the following settints Operating System: macOS CPU Information: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz Number of Available Cores: 12 Available memory: 16 GB Elixir 1.12.0 Erlang 24.0
Benchmark suite executing with the following configuration: warmup: 3 s time: 10 s memory time: 2 s parallel: 1 inputs: Canada
Thank you for the analysis. This looks quite concerning.
A good long-term fix would probably come with https://github.com/erlang/otp/pull/4719, however I'm not exactly sure how to help with this short-term. We could output scientific notation for floats just fine - JSON supports that, however I'm afraid of changing the output of the encoding so dramatically - I assume it might be hardcoded in a lot of places in tests, and other things like that.
\cc @DianaOlympos
not only that but technically decimal: 20 is not precise enough.
I have a few ideas of what happens here and my guess is that the culprit here is that fwrite_g/1
output charlist. While the ryu stuff help a bit with memory, the problem is probably the lists blowing up the stack. erlang/otp#4719 would help mostly because you could call float_to_binary instead, which should help reduce the pressure a bit. Oh and i produce less intermediary results, so maybe that would help.
That will not be out before OTP 24.1. We could make fwrite_g/1 move to binaries then too or before. Would need a lot of change in the tests but doable. Making pretty
the default is a discussion that have not happened yet, but that would not be a big problem for Jason.
There are not good solution for now i am afraid.
That said i would like to see the memory size benchmark run on OTP 24, we already have some improvement to fwrite_g/1
in there.
Also what is happening in pokedex ?
I just had an idea - what if we called IO.iodata_to_binary(:io_lib:fwrite_g(num))
to convert early those parts - perhaps we won't accumulate as much extra memory in this case, if the float formatting is indeed the issue
We can try. Worth running a benchmark with it. Normally the GC should kick in
Thanks to you both for looking at this. I'll run a benchmark with @michalmuskala's suggestion of IO.iodata_to_binary(:io_lib:fwrite_g(num))
to see what that does. All of the benchmarks I've posted in this thread were performed on OTP 24 and Elixir 1.12.0. I'll watch for the release candidate of 24.1 to see what happens there.
I updated the table above to include @michalmuskala's suggested implementation that converts to a binary while encoding floats. It has similar performance to the current implementation.
@DianaOlympos to your earlier questions, I looked at the pokedex dataset and it also contains a good number of floating point values in it to describe the characteristics of the pokemon. I re-ran that encoding benchmark with the :erlang.float_to_binary
implementation and the memory used went from 797.3kb
to 666.75kb
. So it is definitely affected by this issue, but not as directly as the canada dataset.
I also tested my own production use-case against some of these implementations and I found that changing (or even removing) the float encoding didn't significantly affect my particular use-case. I suspect that something similar is happening in my use case with encoding lots of decimals, dates and datetimes to strings, but I haven't narrowed that down yet.
I'll continue investigating my particular system in parallel, but I think we can keep this issue focused on floats since that seems to have the biggest impact to memory usage and potentially using :erlang.float_to_binary(float, :pretty)
could be a solid choice starting in OTP 24.1? We could try to use a fixed number of decimals for now, or use the scientific notation for now if we wanted to. I'm not sure if the performance gains are worth the change in behavior for users of this library?
The problem of using fixed number of decimals is that it is wrong. Erlang does not handle all doubles as fixed numbers of decimals technically. Scientific work but i would have to check how big of decimals we need, i do not remember from the top of my head. 20 may be enough.
Do you have a repo with your benchmark/data collection or is it just using benchee and looking at memory? I may be able to test the new pretty this weekend
@DianaOlympos I was thinking of using the :erlang.float_to_binary
which looks like it has a fixed number of decimals that it includes, but it outputs in scientific notation.
iex(2)> :erlang.float_to_binary(1.5)
"1.50000000000000000000e+00"
This is not my are of expertise, but I was hoping that the standard library had implemented this in a way that captured as much precision as the float was actually capable of carrying. Obviously if we are losing precision in this serialization format, then we would not want to use it.
The benchmarks I've been using are in this repository. If you clone this repository you can run them with:
cd bench
mix deps.get
mix run encode.exs
I modified that file to comment out all of the other JSON libraries that we were benchmarking against just to keep the runtimes smaller for my own testing.
Yeah so this float_to_binary/1
is the same as Β float_to_binary/2
with {decimal, 20}
iirc which is the one i need to check. The problem is that there is multiple ways in the stdlib to do this, and this one use printf
under the hood and i do not remember if 20 decimals are enough for this. Plus ofc we will lose space, a looooot of space, but well that is the name of the game.
I will see if my PR fix it, in which case i think we may not try to fix it.
So good news: using the new BIF bring the memory used for Canada from 82.63 MB
to 33.59 MB
which is a 2.45x reduction. That is good progress and do point out to the garbage generated. That said, it is still a lot.
Even better news: calling :erlang.float_to_binary(float, [:short])
from the PR double the speed and reduce the memory to 9.22 MB
still a lot but faaaar better.
So i think the lesson here is : list are expensive as hell. And we use a lot of them, I think there are some possible optimisations that would help here, Mostly reducing the amount of intermediary data structure and lists.
as a note, i also get a 2x speed up on pokedex and a memory use that goes from 797.3 KB
to 659.06 KB
even if it has far less floats. So the floats cost is pretty high.
π π€ ποΈ awesome news @DianaOlympos
Do we think that the :erlang.float_to_binary(float, [:short])
option is coming as part of your upstream work in OTP? I'm guessing that we could add a compile-time check to this code so that it prefers to use that option whenever it is compiling for a sufficiently high OTP version?
It is coming, https://github.com/erlang/otp/pull/4719, hopefully in OTP 24.1.
For compile time check, it is more a question for @michalmuskala
I have a production system which builds a dynamic map of trace data, basically a deeply nested structure of lists, maps, integers, strings and decimals and it saves them to a
jsonb
field. Under the hood Ecto is usingJason
to encode this list into a JSON string and I'm seeing a case where encoding a structure is using a surprising amount of memory.If I encode it with a lot of memory available I get a final
byte_size
of ~98MB, but if I set a maximum heap space of 2GB I end up crashing the process while encoding it. So at some point during the encoding process we are using over 2GB of heap space to encode a 98MB string of json.In the examples below I stashed the nested structure in an agent so I can get back to the data easily and test with fresh iex processes. The trace data contains some sensitive information, but I can generate an anonymized version of it to help with debugging if we want to dig deeper.
Example Showing 2GB of Heap Usage
Example Showing the Final JSON size with a higher heap limit