This change makes scx_rusty mempolicy aware. When a process uses set_mempolicy it can change NUMA memory preferences and cause performance issues when tasks are scheduled on remote NUMA nodes. This change modifies task_pick_domain to use the new helper method that returns the preferred node id.
With the --mempolicy-affinity flag set:
$ stress-ng -M --mbind 1 --malloc 5 -t 10 --bigheap 10 --numa 5
stress-ng: info: [873775] setting to a 10 secs run per stressor
stress-ng: info: [873775] dispatching hogs: 5 malloc, 10 bigheap, 5 numa
stress-ng: info: [873796] numa: system has 2 of a maximum 8 memory NUMA nodes
stress-ng: metrc: [873775] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [873775] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [873775] malloc 6079478 10.06 23.71 26.42 604560.22 121266.06 99.71 122880
stress-ng: metrc: [873775] bigheap 1466461 11.89 10.37 108.17 123327.88 12371.04 99.69 9393280
stress-ng: metrc: [873775] numa 35 10.06 1.02 0.50 3.48 22.97 3.03 5120
stress-ng: metrc: [873775] miscellaneous metrics:
stress-ng: metrc: [873775] bigheap 412301.05 realloc calls per sec (geometric mean of 10 instances)
stress-ng: info: [873775] skipped: 0
stress-ng: info: [873775] passed: 20: malloc (5) bigheap (10) numa (5)
stress-ng: info: [873775] failed: 0
stress-ng: info: [873775] metrics untrustworthy: 0
stress-ng: info: [873775] successful run completed in 11.91 secs
scx_rusty default:
$ stress-ng -M --mbind 1 --malloc 5 -t 10 --bigheap 10 --numa 5
stress-ng: info: [875135] setting to a 10 secs run per stressor
stress-ng: info: [875135] dispatching hogs: 5 malloc, 10 bigheap, 5 numa
stress-ng: info: [875155] numa: system has 2 of a maximum 8 memory NUMA nodes
stress-ng: metrc: [875135] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [875135] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [875135] malloc 6272998 10.06 23.23 26.93 623537.39 125044.23 99.73 122240
stress-ng: metrc: [875135] bigheap 986926 11.51 7.07 107.74 85723.73 8595.96 99.73 6389760
stress-ng: metrc: [875135] numa 25 10.05 0.54 0.38 2.49 27.30 1.82 5120
stress-ng: metrc: [875135] miscellaneous metrics:
stress-ng: metrc: [875135] bigheap 398465.21 realloc calls per sec (geometric mean of 10 instances)
stress-ng: info: [875135] skipped: 0
stress-ng: info: [875135] passed: 20: malloc (5) bigheap (10) numa (5)
stress-ng: info: [875135] failed: 0
stress-ng: info: [875135] metrics untrustworthy: 0
stress-ng: info: [875135] successful run completed in 11.52 secs
cfs:
$ stress-ng -M --mbind 1 --malloc 5 -t 10 --bigheap 10 --numa 5
stress-ng: info: [882100] setting to a 10 secs run per stressor
stress-ng: info: [882100] dispatching hogs: 5 malloc, 10 bigheap, 5 numa
stress-ng: info: [882125] numa: system has 2 of a maximum 8 memory NUMA nodes
stress-ng: metrc: [882100] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [882100] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [882100] malloc 6259502 10.07 23.07 27.01 621890.00 124990.82 99.51 122876
stress-ng: metrc: [882100] bigheap 874114 11.50 5.59 108.43 76008.26 7666.52 99.14 5612800
stress-ng: metrc: [882100] numa 395 10.04 3.45 4.72 39.35 48.38 16.27 5120
stress-ng: metrc: [882100] miscellaneous metrics:
stress-ng: metrc: [882100] bigheap 373862.39 realloc calls per sec (geometric mean of 10 instances)
stress-ng: info: [882100] skipped: 0
stress-ng: info: [882100] passed: 20: malloc (5) bigheap (10) numa (5)
stress-ng: info: [882100] failed: 0
stress-ng: info: [882100] metrics untrustworthy: 0
stress-ng: info: [882100] successful run completed in 11.50 secs
The bigheap benchmark sees a moderate improvement, where most everything else is flat or worse. So maybe this flag makes sense if something is using mbind with lots of allocations.
This change makes scx_rusty mempolicy aware. When a process uses
set_mempolicy
it can change NUMA memory preferences and cause performance issues when tasks are scheduled on remote NUMA nodes. This change modifies task_pick_domain to use the new helper method that returns the preferred node id.With the
--mempolicy-affinity
flag set:scx_rusty
default:cfs:
The
bigheap
benchmark sees a moderate improvement, where most everything else is flat or worse. So maybe this flag makes sense if something is using mbind with lots of allocations.