immux / immux1

https://immux.com
0 stars 0 forks source link

Implement basic benchmarking #130

Closed blaesus closed 5 years ago

blaesus commented 5 years ago

The main goal of this PR is to build basic benchmarking and profiling. Build item 1 in https://github.com/immux/immux/issues/129

A. Refactors

The previous structure of the codebase cannot be easily benchmarked. A few refactors are necessary to enable convenient benchmarking.

1. Separate "lib"

A "lib.rs" is extracted from "main.rs" and registered as lib and binrespectively.

The library is called libimmuxdb, which is used in both main executable and benchmarks.

The entry to main executable is moved to src/bin/server.rs. There is very little code left.

2. Separate database client in Rust

A database client or driver is what other programs use to communicate with ImmuxDB. We had it casually defined in test. Now it is extracted to src/connectors/rust.

The Rust client for ImmuxDB is called immuxdb_client.

3. Setting database storage root

Previously, database data is always stored at /tmp. Now, it can be configured in ImmuxDBConfiguration.

4. ImmuxDBConfiguration api

compile_config is merged as a method of ImmuxDBConfiguration. A new helper method get_default() is added.

5. Removal of debug outputs

Database debug outputs, such as "inserting to some key" are dropped, because otherwise they flood output, making benchmark hard to read.

B. Data

Two datasets, berka99 and census90 are utlized. SO2019 is too large and download is in progress.

An initial test shows that the database is way too slow for even the whole of berka99 and census90 to be loaded.

To control benchmarking time, only first 1000 rows of census90 and first 300 rows of each table of berka99 are used.

The data for realistic benchmarking is stored at https://github.com/immux/benchmark-data-raw , which is referenced in Immux codebase as a submodule.

C. Benchmarking & Profiling

Infrastructure is in benches/realistic/mod.rs, that includes building json table from csv files, and iterate through the rows while counting execution time.

Specific code for berka99 and census90 is in the respective files under benches/realistic.

Profiling is provided by https://github.com/ferrous-systems/flamegraph . It works on OSX as well.

D. Results

  1. Get by primary id has tolerable performance.
  2. Performance quickly degrade for inserts as more entries have been insertred.
  3. Most of the time is spent for set_id_list and get_id_list, as illustrated in the flamegraph.

f

Executing bench census90, with tables truncated at row 1000
Waiting 5s for database to be ready...
Initializing database in /tmp/census90/
Existing test data removed
Start benching...
took 546ms to insert 100 items, average 5.46ms per item
took 896ms to insert 100 items, average 8.96ms per item
took 1574ms to insert 100 items, average 15.74ms per item
took 2651ms to insert 100 items, average 26.51ms per item
took 3847ms to insert 100 items, average 38.47ms per item
took 5713ms to insert 100 items, average 57.13ms per item
took 7747ms to insert 100 items, average 77.47ms per item
took 9998ms to insert 100 items, average 99.98ms per item
took 12697ms to insert 100 items, average 126.97ms per item
took 15042ms to insert 100 items, average 150.42ms per item
took 391ms to get 100 items by primary key, average 3.91ms per item
took 374ms to get 100 items by primary key, average 3.74ms per item
took 684ms to get 100 items by primary key, average 6.84ms per item
took 381ms to get 100 items by primary key, average 3.81ms per item
took 1303ms to get 100 items by primary key, average 13.03ms per item
took 503ms to get 100 items by primary key, average 5.03ms per item
took 409ms to get 100 items by primary key, average 4.09ms per item
took 1332ms to get 100 items by primary key, average 13.32ms per item
took 435ms to get 100 items by primary key, average 4.35ms per item
took 478ms to get 100 items by primary key, average 4.78ms per item

Executing bench berka99, with tables truncated at row 300
Waiting 5s for database to be ready...
Initializing database in /tmp/berka99/
Existing test data removed
Start benching...
Loading table 'account'
took 615ms to insert 100 items, average 6.15ms per item
took 957ms to insert 100 items, average 9.57ms per item
took 1814ms to insert 100 items, average 18.14ms per item
Loading table 'card'
took 2416ms to insert 100 items, average 24.16ms per item
took 3236ms to insert 100 items, average 32.36ms per item
took 4805ms to insert 100 items, average 48.05ms per item
Loading table 'client'
took 6737ms to insert 100 items, average 67.37ms per item
took 8562ms to insert 100 items, average 85.62ms per item
took 10165ms to insert 100 items, average 101.65ms per item
Loading table 'disp'
took 12430ms to insert 100 items, average 124.30ms per item
took 14938ms to insert 100 items, average 149.38ms per item
took 16775ms to insert 100 items, average 167.75ms per item
Loading table 'district'
Loading table 'loan'
took 18965ms to insert 100 items, average 189.65ms per item
took 22446ms to insert 100 items, average 224.46ms per item
took 26381ms to insert 100 items, average 263.81ms per item
Loading table 'order'
took 27051ms to insert 100 items, average 270.51ms per item
took 30649ms to insert 100 items, average 306.49ms per item
took 34806ms to insert 100 items, average 348.06ms per item
Loading table 'trans'
took 42436ms to insert 100 items, average 424.36ms per item
took 46727ms to insert 100 items, average 467.27ms per item
took 65863ms to insert 100 items, average 658.63ms per item