sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
691 stars 48 forks source link

scx_rusty: Add mempolicy checks to rusty #364

Open hodgesds opened 2 weeks ago

hodgesds commented 2 weeks ago

This change makes scx_rusty mempolicy aware. When a process uses set_mempolicy it can change NUMA memory preferences and cause performance issues when tasks are scheduled on remote NUMA nodes. This change modifies task_pick_domain to use the new helper method that returns the preferred node id.

With the --mempolicy-affinity flag set:

image
$ stress-ng  -M --mbind 1 --malloc 5 -t 10 --bigheap 10 --numa 5
stress-ng: info:  [873775] setting to a 10 secs run per stressor
stress-ng: info:  [873775] dispatching hogs: 5 malloc, 10 bigheap, 5 numa
stress-ng: info:  [873796] numa: system has 2 of a maximum 8 memory NUMA nodes
stress-ng: metrc: [873775] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [873775]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [873775] malloc          6079478     10.06     23.71     26.42    604560.22      121266.06        99.71        122880
stress-ng: metrc: [873775] bigheap         1466461     11.89     10.37    108.17    123327.88       12371.04        99.69       9393280
stress-ng: metrc: [873775] numa                 35     10.06      1.02      0.50         3.48          22.97         3.03          5120
stress-ng: metrc: [873775] miscellaneous metrics:
stress-ng: metrc: [873775] bigheap           412301.05 realloc calls per sec (geometric mean of 10 instances)
stress-ng: info:  [873775] skipped: 0
stress-ng: info:  [873775] passed: 20: malloc (5) bigheap (10) numa (5)
stress-ng: info:  [873775] failed: 0
stress-ng: info:  [873775] metrics untrustworthy: 0
stress-ng: info:  [873775] successful run completed in 11.91 secs

scx_rusty default:

image
$ stress-ng  -M --mbind 1 --malloc 5 -t 10 --bigheap 10 --numa 5
stress-ng: info:  [875135] setting to a 10 secs run per stressor
stress-ng: info:  [875135] dispatching hogs: 5 malloc, 10 bigheap, 5 numa
stress-ng: info:  [875155] numa: system has 2 of a maximum 8 memory NUMA nodes
stress-ng: metrc: [875135] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [875135]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [875135] malloc          6272998     10.06     23.23     26.93    623537.39      125044.23        99.73        122240
stress-ng: metrc: [875135] bigheap          986926     11.51      7.07    107.74     85723.73        8595.96        99.73       6389760
stress-ng: metrc: [875135] numa                 25     10.05      0.54      0.38         2.49          27.30         1.82          5120
stress-ng: metrc: [875135] miscellaneous metrics:
stress-ng: metrc: [875135] bigheap           398465.21 realloc calls per sec (geometric mean of 10 instances)
stress-ng: info:  [875135] skipped: 0
stress-ng: info:  [875135] passed: 20: malloc (5) bigheap (10) numa (5)
stress-ng: info:  [875135] failed: 0
stress-ng: info:  [875135] metrics untrustworthy: 0
stress-ng: info:  [875135] successful run completed in 11.52 secs

cfs:

image
$ stress-ng  -M --mbind 1 --malloc 5 -t 10 --bigheap 10 --numa 5
stress-ng: info:  [882100] setting to a 10 secs run per stressor
stress-ng: info:  [882100] dispatching hogs: 5 malloc, 10 bigheap, 5 numa
stress-ng: info:  [882125] numa: system has 2 of a maximum 8 memory NUMA nodes
stress-ng: metrc: [882100] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [882100]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [882100] malloc          6259502     10.07     23.07     27.01    621890.00      124990.82        99.51        122876
stress-ng: metrc: [882100] bigheap          874114     11.50      5.59    108.43     76008.26        7666.52        99.14       5612800
stress-ng: metrc: [882100] numa                395     10.04      3.45      4.72        39.35          48.38        16.27          5120
stress-ng: metrc: [882100] miscellaneous metrics:
stress-ng: metrc: [882100] bigheap           373862.39 realloc calls per sec (geometric mean of 10 instances)
stress-ng: info:  [882100] skipped: 0
stress-ng: info:  [882100] passed: 20: malloc (5) bigheap (10) numa (5)
stress-ng: info:  [882100] failed: 0
stress-ng: info:  [882100] metrics untrustworthy: 0
stress-ng: info:  [882100] successful run completed in 11.50 secs

The bigheap benchmark sees a moderate improvement, where most everything else is flat or worse. So maybe this flag makes sense if something is using mbind with lots of allocations.