The previous structure of the codebase cannot be easily benchmarked. A few refactors are necessary to enable convenient benchmarking.
1. Separate "lib"
A "lib.rs" is extracted from "main.rs" and registered as lib and binrespectively.
The library is called libimmuxdb, which is used in both main executable and benchmarks.
The entry to main executable is moved to src/bin/server.rs. There is very little code left.
2. Separate database client in Rust
A database client or driver is what other programs use to communicate with ImmuxDB.
We had it casually defined in test. Now it is extracted to src/connectors/rust.
The Rust client for ImmuxDB is called immuxdb_client.
3. Setting database storage root
Previously, database data is always stored at /tmp.
Now, it can be configured in ImmuxDBConfiguration.
4. ImmuxDBConfiguration api
compile_config is merged as a method of ImmuxDBConfiguration.
A new helper method get_default() is added.
5. Removal of debug outputs
Database debug outputs, such as "inserting to some key" are dropped, because otherwise they flood output, making benchmark hard to read.
B. Data
Two datasets, berka99 and census90 are utlized. SO2019 is too large and download is in progress.
An initial test shows that the database is way too slow for even the whole of berka99 and census90 to be loaded.
To control benchmarking time, only first 1000 rows of census90 and first 300 rows of each table of berka99 are used.
Infrastructure is in benches/realistic/mod.rs, that includes building json table from csv files, and iterate through the rows while counting execution time.
Specific code for berka99 and census90 is in the respective files under benches/realistic.
Performance quickly degrade for inserts as more entries have been insertred.
Most of the time is spent for set_id_list and get_id_list, as illustrated in the flamegraph.
Executing bench census90, with tables truncated at row 1000
Waiting 5s for database to be ready...
Initializing database in /tmp/census90/
Existing test data removed
Start benching...
took 546ms to insert 100 items, average 5.46ms per item
took 896ms to insert 100 items, average 8.96ms per item
took 1574ms to insert 100 items, average 15.74ms per item
took 2651ms to insert 100 items, average 26.51ms per item
took 3847ms to insert 100 items, average 38.47ms per item
took 5713ms to insert 100 items, average 57.13ms per item
took 7747ms to insert 100 items, average 77.47ms per item
took 9998ms to insert 100 items, average 99.98ms per item
took 12697ms to insert 100 items, average 126.97ms per item
took 15042ms to insert 100 items, average 150.42ms per item
took 391ms to get 100 items by primary key, average 3.91ms per item
took 374ms to get 100 items by primary key, average 3.74ms per item
took 684ms to get 100 items by primary key, average 6.84ms per item
took 381ms to get 100 items by primary key, average 3.81ms per item
took 1303ms to get 100 items by primary key, average 13.03ms per item
took 503ms to get 100 items by primary key, average 5.03ms per item
took 409ms to get 100 items by primary key, average 4.09ms per item
took 1332ms to get 100 items by primary key, average 13.32ms per item
took 435ms to get 100 items by primary key, average 4.35ms per item
took 478ms to get 100 items by primary key, average 4.78ms per item
Executing bench berka99, with tables truncated at row 300
Waiting 5s for database to be ready...
Initializing database in /tmp/berka99/
Existing test data removed
Start benching...
Loading table 'account'
took 615ms to insert 100 items, average 6.15ms per item
took 957ms to insert 100 items, average 9.57ms per item
took 1814ms to insert 100 items, average 18.14ms per item
Loading table 'card'
took 2416ms to insert 100 items, average 24.16ms per item
took 3236ms to insert 100 items, average 32.36ms per item
took 4805ms to insert 100 items, average 48.05ms per item
Loading table 'client'
took 6737ms to insert 100 items, average 67.37ms per item
took 8562ms to insert 100 items, average 85.62ms per item
took 10165ms to insert 100 items, average 101.65ms per item
Loading table 'disp'
took 12430ms to insert 100 items, average 124.30ms per item
took 14938ms to insert 100 items, average 149.38ms per item
took 16775ms to insert 100 items, average 167.75ms per item
Loading table 'district'
Loading table 'loan'
took 18965ms to insert 100 items, average 189.65ms per item
took 22446ms to insert 100 items, average 224.46ms per item
took 26381ms to insert 100 items, average 263.81ms per item
Loading table 'order'
took 27051ms to insert 100 items, average 270.51ms per item
took 30649ms to insert 100 items, average 306.49ms per item
took 34806ms to insert 100 items, average 348.06ms per item
Loading table 'trans'
took 42436ms to insert 100 items, average 424.36ms per item
took 46727ms to insert 100 items, average 467.27ms per item
took 65863ms to insert 100 items, average 658.63ms per item
The main goal of this PR is to build basic benchmarking and profiling. Build item 1 in https://github.com/immux/immux/issues/129
A. Refactors
The previous structure of the codebase cannot be easily benchmarked. A few refactors are necessary to enable convenient benchmarking.
1. Separate "lib"
A "lib.rs" is extracted from "main.rs" and registered as
lib
andbin
respectively.The library is called
libimmuxdb
, which is used in both main executable and benchmarks.The entry to main executable is moved to
src/bin/server.rs
. There is very little code left.2. Separate database client in Rust
A database client or driver is what other programs use to communicate with ImmuxDB. We had it casually defined in test. Now it is extracted to
src/connectors/rust
.The Rust client for ImmuxDB is called
immuxdb_client
.3. Setting database storage root
Previously, database data is always stored at
/tmp
. Now, it can be configured inImmuxDBConfiguration
.4. ImmuxDBConfiguration api
compile_config
is merged as a method ofImmuxDBConfiguration
. A new helper methodget_default()
is added.5. Removal of debug outputs
Database debug outputs, such as "inserting to some key" are dropped, because otherwise they flood output, making benchmark hard to read.
B. Data
Two datasets, berka99 and census90 are utlized. SO2019 is too large and download is in progress.
An initial test shows that the database is way too slow for even the whole of
berka99
andcensus90
to be loaded.To control benchmarking time, only first 1000 rows of
census90
and first 300 rows of each table ofberka99
are used.The data for realistic benchmarking is stored at https://github.com/immux/benchmark-data-raw , which is referenced in Immux codebase as a submodule.
C. Benchmarking & Profiling
Infrastructure is in
benches/realistic/mod.rs
, that includes building json table from csv files, and iterate through the rows while counting execution time.Specific code for
berka99
andcensus90
is in the respective files underbenches/realistic
.Profiling is provided by https://github.com/ferrous-systems/flamegraph . It works on OSX as well.
D. Results
set_id_list
andget_id_list
, as illustrated in the flamegraph.