occlum / ngo

Next-Gen Occlum, a work-in-progress fork of Occlum that is optimized for the next-generation of Intel SGX (on Xeon SP processors)
Other
33 stars 18 forks source link

Improve the performance of IoUringDisk #263

Open tatetian opened 2 years ago

tatetian commented 2 years ago

According to the benchmark in sgx-disk, the performance of IoUringDisk is inferior to that of SyncIoDisk in some cases. Need to find out why and improve it.

image

lucassong-mh commented 2 years ago

Symptom

Current bench results show that SyncIoDisk beats IoUringDisk up to 54% when concurrency is low, while IoUringDisk gradually overtakes SyncIoDisk when concurrency increases.

Bench result: (4 vcpu, 2GB data, 4KB bufsize, seq-write)

iouring

I've tried two ways to reproduce or mitigate.

Try 1: Tuning related params

Through finer time measurement, IoUringDisk costs most time in poll_completion: submit_time:complete_time = 500 ns:10 us.

Tuning ways:

It seems the tuning cannot change the symptom described above.

Try 2: Write micro bench in C to compare sync_io with io_uring

Toolchain: gcc 9.4.0, kernel 5.13.0, liburing (io_uring library, latest version)

Bench result: (seq-write, 1GB data, 4KB bufsize, 1 thread, io_uring_queue_init with IORING_SETUP_SQPOLL)

syncio

Result shows that sync_io beats io_uring 15.6%.


I try to conclude that current symptom is resonable. Io_uring is designed to process large parallelized I/O requests. When large concurrent I/O requests arrive, zero-copy ring buffer of io_uring can gain performance benefits compared with frequent system calls of synchronized I/O.