Memory usage spike during `Jason.encode`

mmmries commented 3 years ago

I have a production system which builds a dynamic map of trace data, basically a deeply nested structure of lists, maps, integers, strings and decimals and it saves them to a jsonb field. Under the hood Ecto is using Jason to encode this list into a JSON string and I'm seeing a case where encoding a structure is using a surprising amount of memory.

If I encode it with a lot of memory available I get a final byte_size of ~98MB, but if I set a maximum heap space of 2GB I end up crashing the process while encoding it. So at some point during the encoding process we are using over 2GB of heap space to encode a 98MB string of json.

In the examples below I stashed the nested structure in an agent so I can get back to the data easily and test with fresh iex processes. The trace data contains some sensitive information, but I can generate an anonymized version of it to help with debugging if we want to dig deeper.

Example Showing 2GB of Heap Usage

Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(spiff@10.60.5.17)1> {env, obligations} = Agent.get(pid("0.26285.21"), fn data -> data end); nil
nil
iex(spiff@10.60.5.17)4> Process.info(self(), :memory)
{:memory, 367926328}
iex(spiff@10.60.5.17)5> max_heap_words = div(2 * 1024 * 1024 * 1024, :erlang.system_info(:wordsize))
268435456
iex(spiff@10.60.5.17)6> Process.flag(:max_heap_size, max_heap_words)
%{error_logger: true, kill: true, size: 0}
iex(spiff@10.60.5.17)9> Jason.encode(obligations); nil

22:36:23.617 [error]      Process:            #PID<8985.27770.21> on node :"spiff@10.60.5.17"
     Context:            maximum heap size reached
     Max Heap Size:      268435456
     Total Heap Size:    270188606
     Kill:               true
     Error Logger:       true
     Message Queue Len:  0
     GC Info:            [
  old_heap_block_size: 95360811,
  heap_block_size: 174828154,
  mbuf_size: 1990542,
  recent_size: 65783490,
  stack_size: 360,
  old_heap_size: 0,
  heap_size: 77476442,
  bin_vheap_size: 3352,
  bin_vheap_block_size: 46422,
  bin_old_vheap_size: 0,
  bin_old_vheap_block_size: 46422
]

Example Showing the Final JSON size with a higher heap limit

Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(spiff@10.60.5.17)1> max_heap_words = div(4 * 1024 * 1024 * 1024, :erlang.system_info(:wordsize))
536870912
iex(spiff@10.60.5.17)2> Process.flag(:max_heap_size, max_heap_words)
%{error_logger: true, kill: true, size: 0}
iex(spiff@10.60.5.17)3> {env, obligations} = Agent.get(pid("0.26285.21"), fn data -> data end); nil
nil
iex(spiff@10.60.5.17)4> Jason.encode!(obligations) |> byte_size()
103524838

michalmuskala commented 3 years ago

This indeed does look quite concerning. There's nothing that immediately jumps to my mind that could cause this. It would be very useful, if I had a chance to reproduce.

One thing to check potentially could be the size of the intermediate iodata that we encode into, before we convert into the final string. You could get it with something like Jason.encode_to_iodata!(data) |> :erts_debug.flat_size

mmmries commented 3 years ago

❤️ 💛 💙 💚 💜 Thanks @michalmuskala I had not thought of checking the iodata size, but I'll give that a try. I'm not sure if ecto uses the iodata representation or if it uses the binary representation under the hood, so I'll try to check that as well.

I'll my findings and hopefully a more reproducible example soon

mmmries commented 3 years ago

I'm using the following script to investigate this memory usage:

# set process memory limit
max_heap_words = div(floor(2.0 * 1024 * 1024 * 1024), :erlang.system_info(:wordsize))
Process.flag(:max_heap_size, max_heap_words)

# load trace, trace.bin was created by doing `File.write("/tmp/trace.bin", :erlang.term_to_binary(trace))` on a production server
# it is not included here because it contains some sensitive data
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem initial (mb)")
trace = File.read!("tmp/trace.bin") |> :erlang.binary_to_term()
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem after loading trace (mb)")

# check size of data
div(trace |> :erts_debug.flat_size(), 1024 * 1024) |> IO.inspect(label: "trace flat_size (mb)")
div(trace |> Jason.encode_to_iodata!() |> :erts_debug.flat_size(), 1024 * 1024) |> IO.inspect(label: "encode_to_io_data flat_size (mb)")
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem after encode_to_iodata! (mb)")

# try to encode to binary
div(trace |> :erts_debug.flat_size(), 1024 * 1024) |> IO.inspect(label: "trace flat_size (mb)")
div(trace |> Jason.encode!() |> byte_size(), 1024 * 1024) |> IO.inspect(label: "encode! byte_size (mb)")
div(Process.info(self(), :memory) |> elem(1), 1024 * 1024) |> IO.inspect(label: "mem after encode! (mb)")

If I set a memory limit of 2.5GB this script completes successfully with this output:

mem initial (mb): 0
mem after loading trace (mb): 244
trace flat_size (mb): 25
encode_to_io_data flat_size (mb): 61
mem after encode_to_iodata! (mb): 785
trace flat_size (mb): 25
encode! byte_size (mb): 46
mem after encode! (mb): 825

If I run this script with a max heap size of 2.0GB I get this error:

mem initial (mb): 0
mem after loading trace (mb): 244
trace flat_size (mb): 25
encode_to_io_data flat_size (mb): 61
mem after encode_to_iodata! (mb): 785
trace flat_size (mb): 25
** (EXIT from #PID<0.93.0>) killed

So encode_to_iodata produces data with a flat_size of 61MB, but it has grown the memory usage up over 700MB while producing that data. And doing the encode! to binary uses more than that and ends up going over the 2GB limit for total heap size.

I shared the file with you through another channel, but please don't feel obligated to look into this. I appreciate the work you do for the community and I'm posting here mostly to share my experience with the community. This library is so useful and I'll continue to work on creating a reproducible that does not involve any sensitive data so I can post here as well.

mmmries commented 3 years ago

I came up with a reproducible that I think shows the same sort of memory spike as what I'm seeing my production use case.

https://github.com/mmmries/jason_memory_repro

The main idea is that you have some big structs with a lot of data in them, but their Jason.Encoder protocol implementation only uses a very small amount of that data. Then you create a parent structure that nests many copies of the struct. You can play around with the size of the trace data and the maximum heap space, but I left it in a state where the final jason binary would be ~200MB, but we end up going over 650MB of heap space used while encoding it to a binary. That's not as much memory bloat as I'm seeing production, but it's at least in the ballpark.

mmmries commented 3 years ago

I did a quick test to see how much memory usage we are using in the current jason encoding benchmarks. I ran the bench/encode.exs file and commented out all of the other jason encoders to just get an idea of how much memory it takes to do an encode_to_iodata! and encode! function calls and I compared that to the byte size of the json fixtures

dataset	json size (kb)	encode_to_iodata (kb)	Multiple
blockchain	18.0	79.0	4.4
canada	2150.4	84602.9	39.3
giphy	121.0	861.6	7.1
github	54.0	221.2	4.1
govtrack	3788.8	19415.0	5.1
issue-90	7884.8	833.1	0.1
json-generator	108.0	778.3	7.2
pokedex	55.0	797.3	14.5
utf-8-unescaped	14.0	6.1	0.4

I re-ran these tests using Jason.encode! and got the same memory usage across the board. So at least for these cases where we parse from JSON, it appears that encode_to_iodata has the same memory characteristics as encode!

My main conclusion looking at it this data is that the encoding overhead is not a fixed in size and also not proportional to the json byte size. Given the wide variance between 0.1 and 39.3 factors it seems that there are some cases we encoding using a lot of extra memory and others where it has very little overhead.

I'm going to keep diving deeper on the cases where there is a lot of overhead and compare those to my reproducible case to better understand where this overhead comes from

mmmries commented 3 years ago

For reference I also ran the current encoder benchmarks with Poison and JSX and this library uses the least amount across the board (I'm showing the canada dataset as a single comparison point). So this affects all of the pure-elixir implementations

mmmries commented 3 years ago

I got a little further on this today and found that one of the big things that contributes to memory usage are scalar values that need to be "stringified". The "canada" data was an obvious outlier in terms of memory usage and it turned out that the this JSON is mostly a bunch of floating point values (a big list of floating point pairs, I think representing lat/long coordinates?)

When we run encode_to_iodata! for floating point values we end up calling :io_lib_format.fwrite_g(float) which I couldn't find in the erlang documentation, so I'm guessing it's an un-documented part of the standard library that is optimized for formatting floats as a string?

Just to test this hypothesis I changed a fork of the json library to just not encode floats (this is obviously not a workable solution, but I'm just checking myself) and now running an encode on the canada dataset takes 5.10MB of memory compared to the current 82.62MB of memory.

Is there anything special about :io_lib_format.fwrite? Could we potentially export those as ["123", ".", "456"]? to avoid allocating bigger strings?

mmmries commented 3 years ago

Here are a few other alternates I tried. I think it's pretty interesting that float_to_binary(float, decimal: 20) results in 2.74x speed increase as well as a 8.7x memory reduction.

function	memory usage	ips	what it breaks
`:io_lib_format.fwrite`	82.62	5.26	this is the current implementation
`IO.iodata_to_binary(:io_lib_format.fwrite_g(float))`	86.74	6.68
`:erlang.float_to_binary(float)`	10.19	14.02	Adds a scientific notation that is not supported by JSON
`:erlang.float_to_binary(float, decimals: 20)`	9.46	14.43	Always outputs 20 decimals of precision so we get `99.98999999999999488409` instead of `99.99`
`:erlang.float_to_list(float, decimals: 20)`	45.17	8.87	Same as ^

I ran all of these benchmarks with the following settints Operating System: macOS CPU Information: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz Number of Available Cores: 12 Available memory: 16 GB Elixir 1.12.0 Erlang 24.0

Benchmark suite executing with the following configuration: warmup: 3 s time: 10 s memory time: 2 s parallel: 1 inputs: Canada

michalmuskala commented 3 years ago

Thank you for the analysis. This looks quite concerning.

A good long-term fix would probably come with https://github.com/erlang/otp/pull/4719, however I'm not exactly sure how to help with this short-term. We could output scientific notation for floats just fine - JSON supports that, however I'm afraid of changing the output of the encoding so dramatically - I assume it might be hardcoded in a lot of places in tests, and other things like that.

\cc @DianaOlympos

DianaOlympos commented 3 years ago

not only that but technically decimal: 20 is not precise enough.

I have a few ideas of what happens here and my guess is that the culprit here is that fwrite_g/1 output charlist. While the ryu stuff help a bit with memory, the problem is probably the lists blowing up the stack. erlang/otp#4719 would help mostly because you could call float_to_binary instead, which should help reduce the pressure a bit. Oh and i produce less intermediary results, so maybe that would help.

That will not be out before OTP 24.1. We could make fwrite_g/1 move to binaries then too or before. Would need a lot of change in the tests but doable. Making pretty the default is a discussion that have not happened yet, but that would not be a big problem for Jason.

There are not good solution for now i am afraid.

That said i would like to see the memory size benchmark run on OTP 24, we already have some improvement to fwrite_g/1 in there.

Also what is happening in pokedex ?

michalmuskala commented 3 years ago

I just had an idea - what if we called IO.iodata_to_binary(:io_lib:fwrite_g(num)) to convert early those parts - perhaps we won't accumulate as much extra memory in this case, if the float formatting is indeed the issue

DianaOlympos commented 3 years ago

We can try. Worth running a benchmark with it. Normally the GC should kick in

mmmries commented 3 years ago

Thanks to you both for looking at this. I'll run a benchmark with @michalmuskala's suggestion of IO.iodata_to_binary(:io_lib:fwrite_g(num)) to see what that does. All of the benchmarks I've posted in this thread were performed on OTP 24 and Elixir 1.12.0. I'll watch for the release candidate of 24.1 to see what happens there.

mmmries commented 3 years ago

I updated the table above to include @michalmuskala's suggested implementation that converts to a binary while encoding floats. It has similar performance to the current implementation.

@DianaOlympos to your earlier questions, I looked at the pokedex dataset and it also contains a good number of floating point values in it to describe the characteristics of the pokemon. I re-ran that encoding benchmark with the :erlang.float_to_binary implementation and the memory used went from 797.3kb to 666.75kb. So it is definitely affected by this issue, but not as directly as the canada dataset.

I also tested my own production use-case against some of these implementations and I found that changing (or even removing) the float encoding didn't significantly affect my particular use-case. I suspect that something similar is happening in my use case with encoding lots of decimals, dates and datetimes to strings, but I haven't narrowed that down yet.

I'll continue investigating my particular system in parallel, but I think we can keep this issue focused on floats since that seems to have the biggest impact to memory usage and potentially using :erlang.float_to_binary(float, :pretty) could be a solid choice starting in OTP 24.1? We could try to use a fixed number of decimals for now, or use the scientific notation for now if we wanted to. I'm not sure if the performance gains are worth the change in behavior for users of this library?

DianaOlympos commented 3 years ago

The problem of using fixed number of decimals is that it is wrong. Erlang does not handle all doubles as fixed numbers of decimals technically. Scientific work but i would have to check how big of decimals we need, i do not remember from the top of my head. 20 may be enough.

Do you have a repo with your benchmark/data collection or is it just using benchee and looking at memory? I may be able to test the new pretty this weekend

mmmries commented 3 years ago

@DianaOlympos I was thinking of using the :erlang.float_to_binary which looks like it has a fixed number of decimals that it includes, but it outputs in scientific notation.

iex(2)> :erlang.float_to_binary(1.5)
"1.50000000000000000000e+00"

This is not my are of expertise, but I was hoping that the standard library had implemented this in a way that captured as much precision as the float was actually capable of carrying. Obviously if we are losing precision in this serialization format, then we would not want to use it.

The benchmarks I've been using are in this repository. If you clone this repository you can run them with:

cd bench
mix deps.get
mix run encode.exs

I modified that file to comment out all of the other JSON libraries that we were benchmarking against just to keep the runtimes smaller for my own testing.

DianaOlympos commented 3 years ago

Yeah so this float_to_binary/1 is the same as float_to_binary/2 with {decimal, 20} iirc which is the one i need to check. The problem is that there is multiple ways in the stdlib to do this, and this one use printf under the hood and i do not remember if 20 decimals are enough for this. Plus ofc we will lose space, a looooot of space, but well that is the name of the game.

I will see if my PR fix it, in which case i think we may not try to fix it.

DianaOlympos commented 3 years ago

So good news: using the new BIF bring the memory used for Canada from 82.63 MB to 33.59 MB which is a 2.45x reduction. That is good progress and do point out to the garbage generated. That said, it is still a lot.

Even better news: calling :erlang.float_to_binary(float, [:short]) from the PR double the speed and reduce the memory to 9.22 MB still a lot but faaaar better.

So i think the lesson here is : list are expensive as hell. And we use a lot of them, I think there are some possible optimisations that would help here, Mostly reducing the amount of intermediary data structure and lists.

DianaOlympos commented 3 years ago

as a note, i also get a 2x speed up on pokedex and a memory use that goes from 797.3 KB to 659.06 KB even if it has far less floats. So the floats cost is pretty high.

mmmries commented 3 years ago

🎉 🚤 🏎️ awesome news @DianaOlympos

Do we think that the :erlang.float_to_binary(float, [:short]) option is coming as part of your upstream work in OTP? I'm guessing that we could add a compile-time check to this code so that it prefers to use that option whenever it is compiling for a sufficiently high OTP version?

DianaOlympos commented 3 years ago

It is coming, https://github.com/erlang/otp/pull/4719, hopefully in OTP 24.1.

For compile time check, it is more a question for @michalmuskala

michalmuskala / jason

Memory usage spike during `Jason.encode` #134

Example Showing 2GB of Heap Usage

Example Showing the Final JSON size with a higher heap limit