Closed vyzo closed 7 years ago
Measurements showing performance improvement from v1.0 with the optimizaitons in #68:
mcnode-v1.0:
1000 [0-1k] 0m10.233s
1000 [1k-2k] 0m10.287s
2000 [2k-4k] 0m17.673s
4000 [4k-8k] 0m33.720s
8000 [8k-16k] 1m4.431s
mcnode/parallel-fetch:
1000 [0-1k] 0m8.716s
1000 [1k-2k] 0m9.709s
2000 [2k-4k] 0m15.859s
4000 [4k-8k] 0m26.150s
8000 [8k-16k] 0m47.155s
mcnode/parallel-fetch+merge-batch:
1000 [0-1k] 0m5.598s
1000 [1k-2k] 0m6.325s
2000 [2k-4k] 0m8.548s
4000 [4k-8k] 0m18.896s
8000 [8k-16k] 0m34.640s
These measurements come from my laptop, with 4vcpus (2 cores, 4 threads) and pretty lousy bandwidth and latency from the peer node.
The test was performed with a clean database, with successive merges from QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ
with the query SELECT * FROM images.dpla LIMIT $x
Measurements from an ec2 test node:
1000 [0-1k] 0m0.731s
1000 [1k-2k] 0m1.018s
2000 [2k-4k] 0m1.639s
4000 [4k-8k] 0m2.805s
8000 [8k-16k] 0m5.386s
100k merge in ec2 in 35s:
$ time mcclient merge QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ "SELECT * FROM images.dpla LIMIT 100000"
merged 100000 statements and 100000 objects
real 0m35.354s
user 0m0.265s
sys 0m0.025s
1MM merge in ec2:
$ time mcclient merge QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ "SELECT * FROM images.dpla LIMIT 1000000"
merged 1000000 statements and 1000000 objects
real 6m31.248s
user 0m0.253s
sys 0m0.037s
:+1:
A final measurement with an additional optimization (occurs check delayed until request time, so that it can happen in parallel):
$ time mcclient merge QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ "SELECT * FROM images.dpla LIMIT 1000000"
merged 1000000 statements and 1000000 objects
real 6m23.390s
user 0m0.264s
sys 0m0.025s
so we are merging at a cool 2.5K writes/s
Metadata merges as initially implemented in #41 fetch data batches synchronously within the flow of the query stream.
This may be fine for small merges, but throughput will suffer in larger merges and potentially hold up open query result sets in the source.
The data merge can be implemented with a background goroutine fetching data batches as requested by the primary merge goroutine through a buffered channel.