Closed xiaguan closed 7 months ago
but the disk I/O will occupy the runtime thread and block other tasks from being scheduled?
note that it's sync I/O here instead of async I/O inside the spawn_blocking
Here is a simple perf stat result. TPCH 10G data
select
sum(l_extendedprice) as sum_base_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem;
Remove spawn_blocking
14,113.10 msec task-clock # 1.173 CPUs utilized
7,085 context-switches # 502.016 /sec
64 cpu-migrations # 4.535 /sec
772,980 page-faults # 54.770 K/sec
22,703,318,259 cycles # 1.609 GHz (82.99%)
1,784,793,934 stalled-cycles-frontend # 7.86% frontend cycles idle (83.49%)
6,680,958,786 stalled-cycles-backend # 29.43% backend cycles idle (83.24%)
36,582,277,643 instructions # 1.61 insn per cycle
# 0.18 stalled cycles per insn (83.48%)
6,194,729,232 branches # 438.935 M/sec (83.25%)
194,448,934 branch-misses # 3.14% of all branches (83.63%)
12.033058573 seconds time elapsed
8.076494000 seconds user
6.023971000 seconds sys
The main
branch
23,975.43 msec task-clock # 1.418 CPUs utilized
356,000 context-switches # 14.849 K/sec
281 cpu-migrations # 11.720 /sec
657,275 page-faults # 27.415 K/sec
29,943,693,477 cycles # 1.249 GHz (83.17%)
2,389,946,991 stalled-cycles-frontend # 7.98% frontend cycles idle (83.23%)
9,854,147,750 stalled-cycles-backend # 32.91% backend cycles idle (83.00%)
37,148,855,213 instructions # 1.24 insn per cycle
# 0.27 stalled cycles per insn (83.37%)
6,336,456,023 branches # 264.290 M/sec (83.31%)
217,676,506 branch-misses # 3.44% of all branches (83.95%)
16.905472146 seconds time elapsed
11.701711000 seconds user
11.605966000 seconds sys
block_in_place
17,933.10 msec task-clock # 1.329 CPUs utilized
50,364 context-switches # 2.808 K/sec
120 cpu-migrations # 6.692 /sec
958,696 page-faults # 53.460 K/sec
28,158,279,564 cycles # 1.570 GHz (83.46%)
2,583,071,171 stalled-cycles-frontend # 9.17% frontend cycles idle (83.29%)
8,491,632,892 stalled-cycles-backend # 30.16% backend cycles idle (83.55%)
38,026,410,766 instructions # 1.35 insn per cycle
# 0.22 stalled cycles per insn (83.13%)
6,471,120,813 branches # 360.848 M/sec (83.45%)
240,410,333 branch-misses # 3.72% of all branches (83.22%)
13.490252587 seconds time elapsed
9.807809000 seconds user
8.062861000 seconds sys
By the way, the CPU utilization rate of risinglight is very low even on some complex queries.
All scan executor's IO will be handle by get_block
. This func's performance is very curial for us, we should consider refactor it?
By the way, the CPU utilization rate of risinglight is very low even on some complex queries.
The aggregation would be limited by a single cpu core now. We don't partition data and make it parallel yet.
Would you please try block_in_place and see if it improves perf?
Would you please try block_in_place and see if it improves perf?
Updated
Thanks for your work and based on the benchmark result, I would rather opt in for block_in_place
. The problem of having I/Os inside the tokio runtime thread is that it avoids other futures from being scheduled. This would only be a temporary relief until we do a full parallelism in the system.
This usage of
spawn_blocking
is useless. And waste some performance , since tokio may crate a thread for it.