immux / immux1

https://immux.com
0 stars 0 forks source link

Benchmark Design 0.1 for ImmuxDB 1.0 #129

Closed blaesus closed 5 years ago

blaesus commented 5 years ago

This a proposal for a prototypical benchmark setup for ImmuxDB 1.0.

ImmuxDB 1.0 focuses on basic operations (insert, update, select, and revert), which will be measured by benchmark 0.1.

The purpose of the benchmark is not marketing, but getting to know the performance of the database engine under realistic loads. Therefore, we will use real data, rather than generated random/serial data. To avoid over-fitting, we use datasets from multiple sources of various nature.

The benchmarking itself should aim finish within 10 minutes in our development laptops.

Model

Measurements

Operations

Test Environment

Andy's laptop. 12 cores X 32GB memory X SSD.

Datasets

1. Census90

It's a subset of US Census 1990 data, which is tiny (1 table, ~2m rows x 68 columns; 361MB CSV).

Source: https://archive.ics.uci.edu/ml/datasets/US+Census+Data+(1990)

This is a simple dataset. All cells are nominally of integer type (encoding ancestry, gender, etc.).

Test cases

2. Berka99

It's opertaional data of Czech Bank. So named because it's prepared by Petr Berka in 1999.

The dataset is even smaller than Census90, but it contains relations (8 tables, ~10 columns, max ~1m row; 67MB CSV).

Most importantly, Berka99 contains about a million "transactions", which represent changes to balances of accounts. This is perfect for us to measure the update operation.

Source: https://data.world/lpetrocelli/czech-financial-dataset-real-anonymized-transactions

Test cases

3. SO2019

Data dump from StackOverflow, updated by 2019. Size is medium (8 tables; 50GB CSV).

This is a real dataset of a popular social media site, with rich data types and relations, making it suitable for our benchmark.

The main targets for SO2019 testset is: (1) measure performance for larger entries (closer to 1KB per record), and (2) measure performance when the whole dataset is larger than the memory.

Source: https://archive.org/details/stackexchange

Test cases

blaesus commented 5 years ago

Implemented census90 and berka99. Database is too slow fo so2019 so this bench is not implemented yet.