uuid6 / uuid6-ietf-draft

Next Generation UUID Formats
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
187 stars 11 forks source link

Performance testing for UUIDv7 #128

Open sergeyprokhorenko opened 11 months ago

sergeyprokhorenko commented 11 months ago

I encourage you to publish here performance testing programmig code, test cases (initial data) and performance testing results both for generation and seach, attaching or indicating UUIDv7 generators, UUIDv7's exact structure, technical spec sheets, generation algorithms, programming langueges, hardware, libraries and DBMS or other platforms. Performance should be expressed, respectively, in UUIDv7's generated or found per millisecond. It would be great if performance tests were universal, and allowed to compare different implementations of UUIDv7 generators

It would be great if you use this pattern:

Author Repository or reference Standard, version Description Language Platform Format Write frequency, kHz Read frequency, kHz
                 
                 
                 
                 

Designations:

Performance testing programmig code for the "Read frequency" column:

select count (*) as n from t a
inner join t b on b.id = a.id

Attribute t.id contains 100,000,000 different values of UUIDv7 format generated at full rate

fabiolimace commented 11 months ago

Here are some tests based on Fabio Telles' article (in Portuguese): UUID insert tests.

Result of Test 1 (generate each UUID while inserting):

BTREE TABLE    DURATION (MS)      DIRATION (HUMAN)    UNITS PER MILLIS
seq_btree       28029.854 ms           28 s 029 ms    356.762471898712
uuid1_btree     22991.672 ms           22 s 991 ms    434.940094830858
uuid4_btree    211369.757 ms    03 min 31 s 369 ms     47.310457947869
uuid7_btree     97811.904 ms    01 min 37 s 811 ms    102.237044685276

HASH TABLE     DURATION (MS)      DIRATION (HUMAN)    UNITS PER MILLIS
seq_hash       140375.071 ms    02 min 20 s 375 ms     71.237719979496
uuid1_hash     126236.485 ms    02 min 06 s 236 ms     79.216400868576
uuid4_hash     138802.738 ms    02 min 18 s 802 ms     72.044688340369
uuid7_hash     169464.461 ms    02 min 49 s 464 ms     59.009422630506

Result of Test 2 (generate all UUIDs before inserting):

BTREE TABLE    DURATION (MS)      DIRATION (HUMAN)    UNITS PER MILLIS
seq_btree       21516.331 ms           21 s 516 ms    464.763253549129
uuid1_btree     38077.444 ms           38 s 077 ms    262.622669735920
uuid4_btree    281123.796 ms    04 min 41 s 123 ms     35.571517396556
uuid7_btree     33079.732 ms           33 s 079 ms    302.299909805798

HASH TABLE     DURATION (MS)      DIRATION (HUMAN)    UNITS PER MILLIS
seq_hash       170375.793 ms    02 min 50 s 375 ms     58.693784040083
uuid1_hash     226407.549 ms    03 min 46 s 407 ms     44.168138580926
uuid4_hash     235586.245 ms    03 min 55 s 586 ms     42.447299926190
uuid7_hash     255430.946 ms    04 min 15 s 430 ms     39.149524192734

NOTES:

sergeyprokhorenko commented 11 months ago

fabiolimace, could you please, for comparability with other implementations, express the generation rate (writing) and the rate of search of generated UUIDv7 (selective reading) in units per millisecond? Please highlight these two values among other values.

fabiolimace commented 11 months ago

Sure! Done.

bgrainger commented 11 months ago

(I wasn't sure how all the columns were defined, so I took a best guess.)

Author Repository or reference Standard, version Description Language Platform Format Write frequency, kHz Read frequency, kHz
@bgrainger NGuid 0.2.0 rfc4122bis-09 .NET library for generating GUIDs C# .NET 7.0 UUIDv7 8,510.6 n/a
Test code ```csharp var stopwatch = Stopwatch.StartNew(); const int guidCount = 100_000_000; for (var i = 0; i < guidCount; i++) GuidHelpers.CreateVersion7(); stopwatch.Stop(); $"{guidCount:n0} v7 GUIDs in {stopwatch.ElapsedMilliseconds:n0}ms; {((double) guidCount / stopwatch.ElapsedMilliseconds):f1}/ms".Dump(); ``` > 100,000,000 v7 GUIDs in 11,750ms; 8510.6/ms Intel Core i7-10875H @ 2.30 GHz
sergeyprokhorenko commented 11 months ago

Thank you @bgrainger!

You have achieved high performance!

Sorry for the lack of detailed instructions. I meant the following:

medo64 commented 11 months ago

I tried to follow @bgrainger example as close as possible. Since my code does both UUIDv7 and UUIDv4 generation, I expanded both table and code a bit.

Author Repository or reference Standard, version Description Language Platform Format Write frequency, kHz Read frequency, kHz
Josip Medved Medo.Uuid7 rfc4122bis-09 .NET library for generating v7 and v4 UUIDs C# .NET 7.0, .NET Standard 2.0 UUIDv7 (48-bit timestamp, 26-bit counter with rollover guard, 48-bit random) 13401.2 n/a
Josip Medved Medo.Uuid7 rfc4122bis-09 .NET library for generating v7 and v4 UUIDs C# .NET 7.0, .NET Standard 2.0 UUIDv4 (122-bit random) 23849.3 n/a
Test Code using Medo; using System.Diagnostics; { var stopwatch = Stopwatch.StartNew(); const int guidCount = 100_000_000; for (var i = 0; i < guidCount; i++) Uuid7.NewUuid4(); stopwatch.Stop(); Console.WriteLine($"{guidCount:n0} v4 UUIDs in {stopwatch.ElapsedMilliseconds:n0}ms; {((double)guidCount / stopwatch.ElapsedMilliseconds):f1}/ms"); } { var stopwatch = Stopwatch.StartNew(); const int guidCount = 100_000_000; for (var i = 0; i < guidCount; i++) Uuid7.NewUuid7(); stopwatch.Stop(); Console.WriteLine($"{guidCount:n0} v7 UUIDs in {stopwatch.ElapsedMilliseconds:n0}ms; {((double)guidCount / stopwatch.ElapsedMilliseconds):f1}/ms"); }
100,000,000 v4 UUIDs in 4,193ms; 23849.3/ms
100,000,000 v7 UUIDs in 7,462ms; 13401.2/ms

Intel Core i5-1135G7 @ 2.40 GHz

tgross35 commented 10 months ago

It seems like it is impossible to test UUID generation since that is more or less just testing the underlying RNG and how fast it is to get a timestamp. Wall time is still a useful metric, but the underlying cause is potentially a bit misleading.

Also, what is "parsing"? Just checking string length minus hyphens? Extracting version classifier?

Testing https://docs.rs/uuid/latest/uuid using the default RNG:

test 'uuidv1 new now' completed with period 26.806 ns and frequency 37.305 MHz
test 'uuidv1 new fixed timestamp' completed with period 1.020 ns and frequency 980.576 MHz
test 'uuidv4 new' completed with period 172.896 ns and frequency 5.784 MHz
test 'uuidv6 new now' completed with period 29.787 ns and frequency 33.572 MHz
test 'uuidv6 new fixed timestamp' completed with period 1.022 ns and frequency 978.888 MHz
test 'uuidv7 new now' completed with period 201.918 ns and frequency 4.953 MHz
test 'uuidv7 new fixed timestamp' completed with period 171.160 ns and frequency 5.842 MHz
test 'uuidv7 parse' completed with period 16.037 ns and frequency 62.354 MHz
test 'uuidv7 parse hyphenated' completed with period 16.692 ns and frequency 59.910 MHz

Enabling the crate's feature fast-rng:

test 'uuidv1 new now' completed with period 27.239 ns and frequency 36.712 MHz
test 'uuidv1 new fixed timestamp' completed with period 1.031 ns and frequency 969.697 MHz
test 'uuidv4 new' completed with period 24.752 ns and frequency 40.402 MHz
test 'uuidv6 new now' completed with period 30.296 ns and frequency 33.008 MHz
test 'uuidv6 new fixed timestamp' completed with period 1.026 ns and frequency 974.682 MHz
test 'uuidv7 new now' completed with period 41.864 ns and frequency 23.887 MHz
test 'uuidv7 new fixed timestamp' completed with period 24.839 ns and frequency 40.259 MHz
test 'uuidv7 parse' completed with period 16.071 ns and frequency 62.222 MHz
test 'uuidv7 parse hyphenated' completed with period 16.809 ns and frequency 59.491 MHz

The difference between the two is pretty telling about the impact the RNG has. v1 and v6 also show that they more or less evaluate the speed of getting a system timestamp.If I remove the black_box around the MAC argument, I get 4874.435 MHz 🙂

Run on an AMD Ryzen 9 5900X

Library:

Test source ```rust use std::{hint::black_box, time::SystemTime}; use uuid::{NoContext, Uuid}; const WARMUP: u32 = 10_000; const COUNT: u32 = 5_000_000; const PARSE: &str = "a1a2a3a4b1b2c1c2d1d2d3d4d5d6d7d8"; const PARSE_HYPHENATED: &str = "a1a2a3a4-b1b2-c1c2-d1d2-d3d4d5d6d7d8"; fn main() { let ts_stored = uuid::Timestamp::now(NoContext); run_bench("uuidv1 new now", || { Uuid::now_v1(black_box(&[0; 6])); }); run_bench("uuidv1 new fixed timestamp", || { Uuid::new_v1(black_box(ts_stored), black_box(&[0; 6])); }); run_bench("uuidv4 new", || { Uuid::new_v4(); }); run_bench("uuidv6 new now", || { Uuid::now_v6(black_box(&[0; 6])); }); run_bench("uuidv6 new fixed timestamp", || { Uuid::new_v6(black_box(ts_stored), black_box(&[0; 6])); }); run_bench("uuidv7 new now", || { Uuid::now_v7(); }); run_bench("uuidv7 new fixed timestamp", || { Uuid::new_v7(black_box(ts_stored)); }); run_bench("uuidv7 parse", || { Uuid::parse_str(PARSE).unwrap(); }); run_bench("uuidv7 parse hyphenated", || { Uuid::parse_str(PARSE_HYPHENATED).unwrap(); }); } fn run_bench(name: &str, func: impl Fn()) { for _ in 0..WARMUP { black_box(func()); } let start = SystemTime::now(); for _ in 0..COUNT { black_box(func()); } let duration = start.elapsed().unwrap().as_secs_f64(); let t = duration / f64::from(COUNT); let t_ns = t / 1e-9; let f = 1.0 / t; let f_mhz = f / 1e6; println!("test '{name}' completed with period {t_ns:.3} ns and frequency {f_mhz:.3} MHz"); } ```
sergeyprokhorenko commented 10 months ago

@tgross35

It seems like it is impossible to test UUID generation since that is more or less just testing the underlying RNG and how fast it is to get a timestamp.

It seems to me that you are testing UUID generation rate, but instead you need to test rate of insertion and indexing of table records containing UUIDs generated on the fly. Above this is called Write frequency (kHz). It depends not only on the generation rate, but also on the DBMS, on the structure of the UUID and possible deviations from monotonicity.

It is also necessary to test Read frequency (kHz) for these inserted records in accordance with the Performance testing programmig code for the "Read frequency" column, as described:

select count (*) as n from t a
inner join t b on b.id = a.id

Attribute t.id contains 100,000,000 different values of UUIDv7 format generated at full rate

tgross35 commented 10 months ago

The above C# examples seem to be generation tests and not db tests, no? Or was that not the intent?

sergeyprokhorenko commented 10 months ago

@tgross35

The above C# examples seem to be generation tests and not db tests, no? Or was that not the intent?

Everyone is interested in the final result (database performance with UUIDv7 for writing with indexing, as well as for searching), but not the intermediate result (generation rate). Database performance depends not only on the generation rate, but also on the successful choice of the UUIDv7 structure from many possible options and on the algorithms for calculating and buffering UUIDv7 segments. For different DBMSs and different settings of a certain DBMS, the optimal UUIDv7 structure and algorithms may be different. It is important to remember that UUIDv7 implementations compete not only with each other, but also with autoincrement. The goal of performance testing is ultimately to find a UUIDv7 implementation that can compete with autoincrement. No one is interested in the performance of a timestamp source or random number generator per se.

See https://github.com/x4m/pg_uuid_next/issues/1

Zer0x00 commented 9 months ago
Author Repository or reference Standard, version Description Language Platform Format Write frequency, kHz Read frequency, kHz
Python lib/uuid RFC 4122 Standard Python UUID library Python Python UUIDv4 640.48 n/a
@aminalaee uuid-utils RFC 4122 Python UUID implementation using Rust's UUID library Rust / Python Python UUIDv4 7880.61 n/a
@aminalaee uuid-utils rfc4122bis-09 Python UUID implementation using Rust's UUID library Rust / Python Python UUIDv7 6552.52 n/a
Test source import timeit code_to_test_native = """uuid.uuid4()""" code_to_test_lib = """uuid_utils.uuid4()""" code_to_test_lib_uuid7 = """uuid_utils.uuid7()""" # Setup code setup_code = """ import uuid import uuid_utils """ number = 100_000_000 elapsed_time_native = timeit.timeit(code_to_test_native, setup_code, number=number) rate_native = (number / elapsed_time_native) / 1000 elapsed_time_lib = timeit.timeit(code_to_test_lib, setup_code, number=number) rate_lib = (number / elapsed_time_lib) / 1000 elapsed_time_lib_uuid_v7 = timeit.timeit(code_to_test_lib_uuid7, setup_code, number=number) rate_lib_uuid7 = (number / elapsed_time_lib_uuid_v7) / 1000 print(f"The native UUID v4 generation rate is approximately {rate_native:.2f} kHz") print(f"The uuid-utils' UUID v4 generation rate is approximately {rate_lib:.2f} kHz") print(f"The uuid-utils' UUID v7 generation rate is approximately {rate_lib_uuid7:.2f} kHz") # Ryzen 9 5950X in WSL2 on Windows 11
aminalaee commented 9 months ago

@Zer0x00 Thanks for the effort. Is there any way to use Python timeit instead? I think you can get more independent results from that.

Zer0x00 commented 9 months ago

@Zer0x00 Thanks for the effort. Is there any way to use Python timeit instead? I think you can get more independent results from that.

I've updated the code and the values in the original post to use the timeit module.

sergeyprokhorenko commented 5 months ago

UUID Benchmark War