ipfs / benchmarks

Benchmarking for IPFS
MIT License
20 stars 8 forks source link

go cat'ing from js is significantly slower the other combinations + heap out of memory #203

Open jacobheun opened 5 years ago

jacobheun commented 5 years ago

Last week we encountered an issue running the ipfs/interop tests against the latest 0.34 js-ipfs release. When running the exchange files test, Windows hit a JS heap out of memory issue transfering a 134MB file, you can see the ci error here.

While investigating the issue I noticed that when cat'ing large files, js -> go, is significantly the slowest combination. Considering the go implementation is faster, I'd assume that js -> js and go -> js would likely be slower, but they are not.

The heap issue may not be directly related, but this is an area we should investigate.

Of note, the interop tests were making calls out to the greater network because the nodes were connecting to the bootstrap nodes. This was disabled to avoid connecting to the network on test runs, which mitigated the issue for that test suite. If the heap allocation cannot be reproduced with direct tests, adding additional peers may be needed.

     ✓ go -> js: 67.1 MB (6257ms)
      ✓ js -> go: 67.1 MB (9612ms) <--- significantly slower
      ✓ js -> js2: 67.1 MB (7695ms)
      ✓ go -> go2: 67.1 MB (3199ms)

      ✓ go -> js: 134 MB (12211ms)
      ✓ js -> go: 134 MB (18415ms) <--- significantly slower
      ✓ js -> js2: 134 MB (13471ms)
      ✓ go -> go2: 134 MB (6277ms)

      ✓ go -> js: 537 MB (43929ms)
      ✓ js -> go: 537 MB (76308ms) <--- significantly slower
      ✓ js -> js2: 537 MB (57373ms)
      ✓ go -> go2: 537 MB (30036ms)

Here is the ci log for posterity:

      1) go -> js: 134 MB
      2) js -> go: 134 MB
      √ js -> js2: 134 MB: 17695ms
      √ go -> go2: 134 MB: 26171ms
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 00007FF79E64EEE5 
 2: 00007FF79E628CD6 
 3: 00007FF79E6296E0 
 4: 00007FF79EA90D3E 
 5: 00007FF79EA90C6F 
 6: 00007FF79E9DC594 
 7: 00007FF79E9D2B67 
 8: 00007FF79E9D10DC 
 9: 00007FF79E9DA0B7 
10: 00007FF79E9DA136 
11: 00007FF79EAFF7B7 
12: 00007FF79EBD87FA 
13: 000003E8D465C6C1 
mcollina commented 5 years ago

From the call I didn't got it actually run out of memory. That's extremely bad and it will need to be fixed.

@litzenberger can you sync up with @alanshaw when you get online tomorrow? He would like to add that benchmark. Once that is up, we can see what clinic will tell us.

litzenberger commented 5 years ago

Yes, will write this test now.

litzenberger commented 5 years ago

Added js -> go and go -> js. Here are the results running locally:

js -> go

────────────────────┬────────────────────┬──────────────────────────────────────────────────┬────────────────────┬─────────────────────┐
│ Test               │ Warmup             │ Description                                      │ File Set           │ Duration            │
├────────────────────┼────────────────────┼──────────────────────────────────────────────────┼────────────────────┼─────────────────────┤
│ extractJs2Go       │ off                │ Extract files from JS to Go IPFS peers           │ OneKBFile          │ s:0 ms: 179.554333  │
├────────────────────┼────────────────────┼──────────────────────────────────────────────────┼────────────────────┼─────────────────────┤
│ extractJs2Go       │ off                │ Extract files from JS to Go IPFS peers           │ One64MBFile        │ s:5 ms: 969.007397  │
├────────────────────┼────────────────────┼──────────────────────────────────────────────────┼────────────────────┼─────────────────────┤
│ extractJs2Go       │ off                │ Extract files from JS to Go IPFS peers           │ One512MBFile       │ s:42 ms: 648.087633 │
└────────────────────┴────────────────────┴──────────────────────────────────────────────────┴────────────────────┴─────────────────────┘

go -> js

┌────────────────────┬────────────────────┬──────────────────────────────────────────────────┬────────────────────┬─────────────────────┐
│ Test               │ Warmup             │ Description                                      │ File Set           │ Duration            │
├────────────────────┼────────────────────┼──────────────────────────────────────────────────┼────────────────────┼─────────────────────┤
│ extractGo2Js       │ off                │ Extract files from GO to JS IPFS peers           │ OneKBFile          │ s:0 ms: 265.780094  │
├────────────────────┼────────────────────┼──────────────────────────────────────────────────┼────────────────────┼─────────────────────┤
│ extractGo2Js       │ off                │ Extract files from GO to JS IPFS peers           │ One64MBFile        │ s:3 ms: 751.19841   │
├────────────────────┼────────────────────┼──────────────────────────────────────────────────┼────────────────────┼─────────────────────┤
│ extractGo2Js       │ off                │ Extract files from GO to JS IPFS peers           │ One512MBFile       │ s:22 ms: 215.134569 │
└────────────────────┴────────────────────┴──────────────────────────────────────────────────┴────────────────────┴─────────────────────┘
litzenberger commented 5 years ago

Clinic reports for js -> go

https://upload.clinicjs.org/public/238842ad20f8542b91743e377f21ab9f455dcddfea3bc0702830c401da302bbf/34523.clinic-doctor.html

https://upload.clinicjs.org/public/d0491b68df5b070fa226600ac4ce0389d04f35ab4e7f2aa0c718d5ae615101e4/34583.clinic-flame.html

https://upload.clinicjs.org/public/ad48c42cc98c35249690f2a8bb8b7a05880de6e8b3f781a5c720c42e06d0489c/34730.clinic-bubbleprof.html

mcollina commented 5 years ago

There are several bottlenecks/slowness in the benchmarks, and there are a lot of things we can optimize. We did not see the increase of memory and releated crash on our test with 64MB of RAM. In fact, memory usage was flat. Maybe there is something else that is different in our test vs your integration? @jacobheun @alanshaw can you check?

image


Analysis

First and foremost, there is a Node Core issue in https://github.com/nodejs/node/issues/25741 . This is related to a change we made in 10.x to avoid saturating the thread pool when reading big files, and it is backfiring us on this benchmark (it was not intended). I'm working on that on the Node.js core side of things: I can work on a PR to expedite the resolution. Ref: https://upload.clinicjs.org/public/d0491b68df5b070fa226600ac4ce0389d04f35ab4e7f2aa0c718d5ae615101e4/34583.clinic-flame.html#selectedNode=8234&zoomedNode=8234&exclude=83c0&merged=true.

Secondary, there is an issue with the bn.js library in the tree. It uses new Buffer somewhere, and the dependency tree must be updated. Ref: https://upload.clinicjs.org/public/d0491b68df5b070fa226600ac4ce0389d04f35ab4e7f2aa0c718d5ae615101e4/34583.clinic-flame.html#selectedNode=7057&zoomedNode=7057&exclude=83c0&merged=true.

Thirdly, we are spending a lot of time in faux asynchronous activity due to https://github.com/libp2p/js-libp2p-mplex/issues/89. Ref: https://upload.clinicjs.org/public/ad48c42cc98c35249690f2a8bb8b7a05880de6e8b3f781a5c720c42e06d0489c/34730.clinic-bubbleprof.html#x71-c42.

jacobheun commented 5 years ago

Maybe there is something else that is different in our test vs your integration? @jacobheun @alanshaw can you check?

I think the likely culprit here is due to the tests we were running when we hit the heap issue were dialing out to the live ipfs network. My guess is that when we're connected to multiple nodes bitswap starts spamming the network with requests. Perhaps we should add a test that connects the go and js nodes to some other passive nodes to see if there's a significant change?

Thirdly, we are spending a lot of time in faux asynchronous activity due to libp2p/js-libp2p-mplex#89.

I am going to start looking into getting this resolved this week.

mcollina commented 5 years ago

I think the likely culprit here is due to the tests we were running when we hit the heap issue were dialing out to the live ipfs network. My guess is that when we're connected to multiple nodes bitswap starts spamming the network with requests. Perhaps we should add a test that connects the go and js nodes to some other passive nodes to see if there's a significant change?

Should we add such test ourselves? bitswap never got into our radar. How would you design such a benchmark?

jacobheun commented 5 years ago

Bitswap is ultimately called by the Block Service anytime a block is moved around the network.

My initial thought was to connect some number, maybe 2 so we can have at least 1 of each, of passive peers to both the go and js node. Then perform the extract benchmark. Adding and cat'ing the file should cause network calls to happen to each of the connected peers, unless I am mistaken. This might be too indirect of a benchmark though.

Another approach could be to call the block api directly after connecting multiple nodes, but I'm not sure how adequate that would be for a test.

@alanshaw do you have any thoughts on another approach for this?

alanshaw commented 5 years ago

It uses new Buffer somewhere, and the dependency tree must be updated

Yeah looks like asn1.js has a bunch of calls to new Buffer

screenshot 2019-01-29 at 15 32 05
alanshaw commented 5 years ago

Using Node.js v8.15.0 I'm seeing very similar results for the same tests:

      ✓ go -> js: 67.1 MB (5219ms)
      ✓ js -> go: 67.1 MB (8891ms) <-
      ✓ js -> js2: 67.1 MB (7646ms)
      ✓ go -> go2: 67.1 MB (2550ms)

      ✓ go -> js: 134 MB (11176ms)
      ✓ js -> go: 134 MB (17821ms) <-
      ✓ js -> js2: 134 MB (13542ms)
      ✓ go -> go2: 134 MB (6134ms)

      ✓ go -> js: 537 MB (43629ms)
      ✓ js -> go: 537 MB (76079ms) <-
      ✓ js -> js2: 537 MB (53180ms)
      ✓ go -> go2: 537 MB (23463ms)
mcollina commented 5 years ago

Oh, I understood we couldn't run ipfs in Node 8 anymore. Maybe I understood that wrong and I didn't try.

Then definitely it's not https://github.com/nodejs/node/issues/25741 that is causing this. We're still affected by that on Node v10, it's just not causing this specific issue. Can you check if there are any specific differences between Node 8 and 10 on that bench? Our runner/benchmark needs Node 10 as far as I understand.

alanshaw commented 5 years ago

These tests don't give us any other info by default other than the time it took...