zloirock / core-js

Standard Library
MIT License
24.36k stars 1.64k forks source link

`esnext.json.parse` is generating (memory-wise) heavier objects #1354

Open Kosta-Github opened 1 month ago

Kosta-Github commented 1 month ago

When using esnext.json.parse with a reviver function the deserialized objects are heavier (memory-wise) than the ones that are generated by the non-polyfilled JSON.parse() function.

How to reproduce:

// require("core-js/actual/json/parse");

// generate a large JSON string containing an array of 1_000_000 random numbers
const data = [];
for (let i = 0; i < 1_000_000; i++) {
    data.push('' + Math.floor(Math.random() * 100));
}
const json = JSON.stringify({ results: [data] });

// trivial reviver function that does nothing
const reviver = (key, value) => value;

let lastMemUsage = process.memoryUsage();
const results = [];
while (true) {
    const currentMemUsage = process.memoryUsage();
    console.log(
        `[mem] current: ${Math.floor(currentMemUsage.heapTotal / (1024*1024))}, ` +
        `delta: ${Math.floor((currentMemUsage.heapTotal - lastMemUsage.heapTotal) / (1024*1024))}`
    );
    lastMemUsage = currentMemUsage;

    const parsed = JSON.parse(json, reviver);

    // keep a reference to the parsed objects to prevent the GC from collecting them
    results.push(parsed);
}

Let the above script run for a while and observe the memory usage and delta when using unmodified JSON.parse(), which is something like:

[mem] current: 52, delta: 0
[mem] current: 88, delta: 35
[mem] current: 89, delta: 1
[mem] current: 120, delta: 30
[mem] current: 151, delta: 30
[mem] current: 182, delta: 30
[mem] current: 213, delta: 30
[mem] current: 127, delta: -86
[mem] current: 158, delta: 30
[mem] current: 189, delta: 30
[mem] current: 220, delta: 30
[mem] current: 251, delta: 30
[mem] current: 282, delta: 30
[mem] current: 313, delta: 31
[mem] current: 344, delta: 30
[mem] current: 375, delta: 30
[mem] current: 196, delta: -179
[mem] current: 227, delta: 31
[mem] current: 258, delta: 30
[mem] current: 289, delta: 30
[mem] current: 320, delta: 30
[mem] current: 350, delta: 30
[mem] current: 381, delta: 30
[mem] current: 412, delta: 30
[mem] current: 443, delta: 30
[mem] current: 474, delta: 30
[mem] current: 505, delta: 31
[mem] current: 536, delta: 30
[mem] current: 567, delta: 30
[mem] current: 598, delta: 30
...

When uncomment the first line and using the polyfilled JSON.parse() function the output looks like this:

[mem] current: 52, delta: 0
[mem] current: 202, delta: 149
[mem] current: 236, delta: 33
[mem] current: 256, delta: 20
[mem] current: 401, delta: 145
[mem] current: 557, delta: 155
[mem] current: 447, delta: -110
[mem] current: 505, delta: 57
[mem] current: 650, delta: 145
[mem] current: 806, delta: 156
[mem] current: 962, delta: 155
[mem] current: 712, delta: -250
[mem] current: 768, delta: 56
[mem] current: 826, delta: 57
[mem] current: 947, delta: 121
[mem] current: 1103, delta: 155
[mem] current: 1259, delta: 155
[mem] current: 1415, delta: 156
[mem] current: 1571, delta: 155
[mem] current: 1727, delta: 155
[mem] current: 1137, delta: -591
[mem] current: 1194, delta: 57
[mem] current: 1251, delta: 56
[mem] current: 1308, delta: 56
[mem] current: 1395, delta: 86
[mem] current: 1552, delta: 157
[mem] current: 1708, delta: 155
[mem] current: 1863, delta: 155
[mem] current: 2019, delta: 155
[mem] current: 2176, delta: 156
[mem] current: 2332, delta: 156
...

You can see that the memory usage is up to 4-5 times larger and growing way quicker.

$ node --version

v20.15.0
zloirock commented 1 month ago

It's JSON.parse source text access polyfill.

Sure, we can't implement it in JS as optimized as it can be done in JS engines natively.

If you have some proposals how to optimize this polyfill - feel free to open a PR.

If performance is critical for you - you could update your Node, it's available natively -> polyfill not installed from Node 21, or just exclude this module from your app if you don't use JSON.parse source.

Kosta-Github commented 1 month ago
zloirock commented 1 month ago

it is not clear to my why the generated object hierarchies should consume more memory when using the trivial reviver function than without?

Because in your case without reviver is used native JSON.parse, not a polyfill.

Kosta-Github commented 1 month ago

Sure, that is obvious.

The question is, why would the object tree generated by the polyfilled JSON.parse() allocate more/additional memory when used with the reviver function?

I am not concerned about potential additional memory usage during the parse operation, but about the additional memory usage that is kept alive and associated with the returned object hierarchy after the parse operation.

Say, you are parsing this JSON { "hello": "world" } with and without the trivial reviver function. Why should the result consume more memory when the reviver function was used?

zloirock commented 1 month ago

They have the same tree. Why do you think that's not? One more time - when you call JSON.parse with reviver, used polyfilled method, without - native.

Kosta-Github commented 1 month ago

They have the same tree. Why do you think that's not?

Because the memory consumption is higher if that tree was generated with the polyfilled parse() function.

when you call JSON.parse with reviver, used polyfilled method, without - native.

Again, I get that.

This does not explain, why the generated tree consumes more memory. I am not talking about the memory consumption during the parsing.

Something like:

mem_used_by_object(polyfilled.parse(json)) >= 4 * mem_used_by_object(native.parse(json))
zloirock commented 1 month ago

Because the native JSON.parse is more optimized (including memory) than the polyfill? -) They have different representations of this tree in memory, most likely the ways of garbage collection, etc.

If you want it, you could dig into it and try to optimize it. For example, Context#source is the same string on all instances and theoretically should be optimized by modern engines and refer to one place in memory - but something could be wrong. Or regexes usage, which also is not free. Etc. However, some specific features, like descriptors edge cases, are almost impossible to optimize because of the JS nature.

zloirock commented 1 month ago

V8 JSON parser is a low-level C++ tool, it's strange to ask why JS implementation of this takes more memory.

zloirock commented 1 month ago

If you talk about result objects, not about the JSON three, I see only 2 answers: how GC works and descriptors usage -> result objects representation in memory, but that's required for the proper result. In both cases, I don't see how it can be optimized on the core-js side.

zloirock commented 1 month ago

I played with your example with --expose-gc flag and manual GC handling. Even in this case polyfilled method result object takes more memory than native. As an option, it can happen because the result array can be non-optimized.

zloirock commented 1 month ago

In Node where this feature is available natively, also is a difference in memory usage between cases with reviver and without - however, not so significant.

zloirock commented 1 month ago

As I wrote, it's not a bug - it's an issue of optimization for specific engines. As I wrote, if it's interesting for you, feel free to play with internal representations of objects in V8 and open a PR with optimization of this case.