Open iv-menshenin opened 9 months ago
This patch helped mitigate memory spikes in our application. The memory usage chart was clearly improved. Since this seems to be a positive change, I wonder if maintainers (cc @valyala) can consider it? Thank you.
Hi @iv-menshenin, could you provide a way to validate your improvements?
Hi @iv-menshenin, could you provide a way to validate your improvements?
It's working on our production nativerent as long as my PR lives. (≈500 RPS) Or you need some unit-tests ?
No, I'm asking for a simple reproduction. How did you test you improvements? Is it possible to verify this on the benchmarks of this repo? If not, can you provide one?
No, I'm asking for a simple reproduction. How did you test you improvements? Is it possible to verify this on the benchmarks of this repo? If not, can you provide one?
I'll see what I can do.
@StarpTech
Here's what I found.
As I have said before, my suggestion is related to GC load peaks when processing really big data.
I tried multiplying the contents of the ./testdata/canada.json
file so that it reaches 18 megabytes and ran the tests.
Here is what I saw for valyala
BenchmarkParse/canada/fastjson-8 123 9405854 ns/op 1914.52 MB/s 37 B/op 0 allocs/op
BenchmarkParse/canada/fastjson-get-8 49 24405223 ns/op 737.86 MB/s 99548908 B/op 130551 allocs/op
Here's what happens after the improvement:
BenchmarkParse/canada/fastjson-8 136 8963098 ns/op 2009.09 MB/s 12 B/op 0 allocs/op
BenchmarkParse/canada/fastjson-get-8 111 9018742 ns/op 1996.69 MB/s 15 B/op 0 allocs/op
If required, I can do a comit with this test file to demonstrate what said
@iv-menshenin that looks promising. Is there a big performance drop for smaller files? A test file would be great. We maintain a fork which we actively maintain. We have already incorporated a lot of other changes that didn't get merged in this repository. Would you be interested in contributing to it?
Is there a big performance drop for smaller files?
No. There isn't There is a prior memory allocation of 32 kilobytes, but in practice due to the reuse of the parser this has only a positive impact
A test file would be great.
I will add a new case
Would you be interested in contributing to it?
I would like to.
@StarpTech
Digging deeper into the problem again, I remembered that instead of allocations, a huge amount of memory is being consumed. After checking the benchmarks, I realized that things are bad in the tests too.
Look at this var benchPool ParserPool
that we are sharing between all the tests! So all the results here have become non-deterministic.
And when I run the tests over and over again, I get different allocation count over and over again, then it's zero, then it's not. So there is no allocation benefit in my PR, sorry (these were bad tests).
But! There is a memory consumption benefit in my PR that I can show you, just give me time =)
I will do all this in your fork
In the meantime, I'm giving some validation in this PR
You can get this memory consumption graph via pprof if you run the tests with the following parameters:
go test --run=None --bench="BenchmarkParse/.*/fastjson" --memprofile=mem.out
No matter how many times I run the tests, I see 0 memory allocations as a result, but profiling gives a clearer picture of what's going on. And what is happening is consuming more than 1 gigabyte of memory
And this is what the graph looks like after my improvements
I think it's pretty clear
We use fastjson in our project and recently I found an unpleasant issue when processing huge json files (we have 25 megabytes and more than 300 thousand elements in the array): when cache.vs reaches its limit and performs memory reallocation, if the GC fails to clean up the memory in time, we experience a sharp spike in memory usage graphs.
The funniest part is that we barely noticed this, because our application kept crashing with OOM, so we had to temporarily disable the limits for testing.
After working with pprof for a while, I found where the most memory allocations are happening and worked on improvements. I tried several options and this one turned out to be the most efficient, as shown by the tests in our environment.
If you don't accept this pull request, it's okay, then we'll just continue using our fork.
Thank you.