Closed hrdwdmrbl closed 5 years ago
test/test-parse-big.js ................................ 0/4 14s
not ok Yieldable parse took 10x longer than regular parse 13341 vs 260
--- wanted
+++ found
-true
+false
compare: ===
at:
line: 64
column: 9
file: test/test-parse-big.js
function: yj.parseAsync
stack: |
yj.parseAsync (test/test-parse-big.js:64:9)
Immediate.setImmediate (yieldable-parser.js:335:20)
source: >
tap.equal(yieldParseElapsedTime / 10 < nonYieldParseElapsedTime, true,
`Yieldable parse took 10x longer than regular parse ${yieldParseElapsedTime}
vs ${nonYieldParseElapsedTime}`);
not ok Not enough yielding 1
--- wanted
+++ found
-true
+false
compare: ===
at:
line: 65
column: 9
file: test/test-parse-big.js
function: yj.parseAsync
stack: |
yj.parseAsync (test/test-parse-big.js:65:9)
Immediate.setImmediate (yieldable-parser.js:335:20)
source: |
tap.equal(yieldCount > 10, true, `Not enough yielding ${yieldCount}`);
not ok Loop starved too long 13110
--- wanted
+++ found
-true
+false
compare: ===
at:
line: 66
column: 9
file: test/test-parse-big.js
function: yj.parseAsync
stack: |
yj.parseAsync (test/test-parse-big.js:66:9)
Immediate.setImmediate (yieldable-parser.js:335:20)
source: >
tap.equal(longestLoopStarvation < 100, true, `Loop starved too long
${longestLoopStarvation}`);
not ok Tick took too long 299
--- wanted
+++ found
-true
+false
compare: ===
at:
line: 69
column: 13
file: test/test-parse-big.js
type: Timeout
function: setTimeout
method: _onTimeout
stack: |
Timeout.setTimeout (test/test-parse-big.js:69:13)
source: >
tap.equal(longestTickTime < 100, true, `Tick took too long
${longestTickTime}`);
total ................................................. 0/4
0 passing (14s)
4 failing
@hrdwdmrbl - thanks for the PR. I will have a look!
@hrdwdmrbl - I had a detailed look at the test case. While there is nothing wrong with the usage, I would say it exploits the boundary of the tool's capability in terms of how much data to parse in one loop cycle.
If you look at https://github.com/ibmruntimes/yieldable-json/blob/e19f4825e5bf5c17a2d73497d7d14f0774c42c52/yieldable-parser.js#L167-L170 , my logic is to run 512 * intensity
iterations in one shot before yielding to the event loop. In your case, intensity is default (1), so it is 512 units of work.
Now if you look at your JSON layout, it is an object that has 22 (4) arrays, each containing 27 (128) elements, amounting to 512 units of work. So when this is put into operation, the yieldable-parser
does not yield even once, as we have exactly 512 units of work!
If your test case is a true representative of a real workload that you have, I would recommend you reducing the yield counter
to something lesser, and to your convenience.
On the contrary, if this test case is found to be representing more common use case, I could change the default, and users can increase the value by adding the multiplier (intensity) to their convenience.
My rationale for keeping 512
is that it caters to common workload, and that way people can work with default configuration.
now coming to this PR, I would recommend these:
yield
comes into effect - probably lesser array element count or lesser arrays.intensity
variable. I am not sure whether existing tests already do that.Hope this helps!
Yes that does help. I see now that it was because of the way I built my big JSON. Good to understand it at a deeper level. It was not a production-like blob, just wanted something really big. I'll make changes and retry, thanks
Okay yeah, so I created a more balanced (probably more realistic) big JSON and success
Object size 6186728112 bytes
Total time 3531ms
yieldCount 514 times
longestLoopStarvation 16ms
longestTickTime 23ms
thanks for verifying and getting back @hrdwdmrbl - do you think improving the documentation around this specific aspect of the tool (intensity, time to return to loop) would be a good idea? I have it documented, but looks like it is not readily consumable or something? please let me know!
@gireeshpunathil The documentation explains the intensity option, but it doesn't explain how it works internally like you did.
It might make sense to instead expose an option which can directly control the number of iterations to perform. So if the option is specified, the number provided is used instead of 512 * intensity
Ah, @gireeshpunathil with real-world JSON I'm getting blockages up to 87 seconds. :( I'm working to get a copy of one of those real-world JSONs so that I can do further testing.
@hrdwdmrbl - thanks. I will look forward to review the real world JSON data.
So I made a couple tests using very large JSON. I am in search of some solutions for dealing with very large JSON, including up to 2gb in size. I thought that this library might be perfect for our needs, but from my testing I'm having some trouble. Can you take a look at my test and see if I'm mis-using the library?