intel / intel-cmt-cat

User space software for Intel(R) Resource Director Technology
http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
Other
685 stars 180 forks source link

MBA doesn't work on Intel(R) Xeon(R) Gold 6226R CPU #269

Open IteratorandIterator opened 3 months ago

IteratorandIterator commented 3 months ago
  1. I followed the official tutorial to compile and install intel-cmt-cat.
  2. Then, I created a process with a read bandwidth of 10000MB by using sudo ./user/local/bin/membw -c 31 -b 10000 --read.
  3. After that, I used sudo pqos-os -a 'cos:7=31' && sudo pqos-os -e 'mba_max:7=500'.
  4. Finally, I checked the process bandwidth with pidof membw && sudo pqos-os -p mbl:pid.

However, the process bandwidth was not limited at all, the same as when MBA was not used to limit it. Why is this the case? Below is my basic configuration information:

lscpu: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz Stepping: 7 CPU MHz: 3600.000 CPU max MHz: 3900.0000 CPU min MHz: 1200.0000 BogoMIPS: 5800.00 Virtualization: VT-x L1d cache: 512 KiB L1i cache: 512 KiB L2 cache: 16 MiB L3 cache: 22 MiB NUMA node0 CPU(s): 0-31

uname -a Linux HM1 5.15.0-97-generic #107~20.04.1-Ubuntu SMP Fri Feb 9 14:20:11 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

IteratorandIterator commented 3 months ago

Any reply would be appreciated!

rstorozh commented 3 months ago

Could you please provide me with the output of the following command? $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D

You should run it from within your source code root directory.

rstorozh commented 3 months ago

Also, could you please make the same experiment using MSR interface? $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr

rstorozh commented 3 months ago

Finally, after running the tests please obtain the following information: LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -s

IteratorandIterator commented 3 months ago

Could you please provide me with the output of the following command? $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D

You should run it from within your source code root directory.

Thanks for your reply!

The out put of "[zlx@HM1:~/utils/RDT/intel-cmt-cat] $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D" is:

NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. API lock initialization error! Error initializing PQoS library!

The out put of "[zlx@HM1:~/utils/RDT/intel-cmt-cat] $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -D" is:

NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour Hardware capabilities Monitoring Cache Monitoring Technology (CMT) events: LLC Occupancy (LLC) I/O RDT: unsupported scale factor: 65536 max rmid: 128 counter length: 24b Memory Bandwidth Monitoring (MBM) events: Total Memory Bandwidth (TMEM) I/O RDT: unsupported scale factor: 65536 max rmid: 128 counter length: 24b Local Memory Bandwidth (LMEM) I/O RDT: unsupported scale factor: 65536 max rmid: 128 counter length: 24b Remote Memory Bandwidth (RMEM) (calculated) I/O RDT: unsupported scale factor: 65536 max rmid: 128 counter length: 24b PMU events: Instructions/Clock (IPC) LLC misses LLC references LLC misses - pcie read LLC misses - pcie write LLC references - pcie read LLC references - pcie write Allocation Cache Allocation Technology (CAT) L3 CAT CDP: enabled Non-Contiguous CBM: unsupported I/O RDT: unsupported Num COS: 8 Way size: 2097152 bytes Ways contention bit-mask: 0x600 Min CBM bits: 1 Max CBM bits: 11 Memory Bandwidth Allocation (MBA) Num COS: 8 Granularity: 10 Min B/W: 10 Type: linear MBA 4.0 extensions: unsupported Cache information L3 Cache Num ways: 11 Way size: 2097152 bytes Num sets: 32768 Line size: 64 bytes Total size: 23068672 bytes L2 Cache Num ways: 16 Way size: 65536 bytes Num sets: 1024 Line size: 64 bytes Total size: 1048576 bytes

IteratorandIterator commented 3 months ago

Finally, after running the tests please obtain the following information: LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -s

Here are the results:

NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour L3CA/MBA COS definitions for Socket 0: L3CA COS0 => DATA 0x7ff, CODE 0x7ff L3CA COS1 => DATA 0x7ff, CODE 0x7ff L3CA COS2 => DATA 0x7ff, CODE 0x7ff L3CA COS3 => DATA 0x7ff, CODE 0x7ff L3CA COS4 => DATA 0x7ff, CODE 0x7ff L3CA COS5 => DATA 0x7ff, CODE 0x7ff L3CA COS6 => DATA 0x7ff, CODE 0x7ff L3CA COS7 => DATA 0x7ff, CODE 0x7ff MBA COS0 => 20% available MBA COS1 => 100% available MBA COS2 => 100% available MBA COS3 => 100% available MBA COS4 => 100% available MBA COS5 => 100% available MBA COS6 => 100% available MBA COS7 => 10% available Core information for socket 0: Core 0, L2ID 0, L3ID 0 => COS0, RMID84 Core 1, L2ID 1, L3ID 0 => COS0, RMID85 Core 2, L2ID 2, L3ID 0 => COS0, RMID86 Core 3, L2ID 3, L3ID 0 => COS0, RMID87 Core 4, L2ID 4, L3ID 0 => COS0, RMID89 Core 5, L2ID 5, L3ID 0 => COS0, RMID90 Core 6, L2ID 6, L3ID 0 => COS0, RMID60 Core 7, L2ID 7, L3ID 0 => COS0, RMID61 Core 8, L2ID 8, L3ID 0 => COS0, RMID62 Core 9, L2ID 9, L3ID 0 => COS0, RMID63 Core 10, L2ID 10, L3ID 0 => COS0, RMID64 Core 11, L2ID 11, L3ID 0 => COS0, RMID65 Core 12, L2ID 12, L3ID 0 => COS0, RMID66 Core 13, L2ID 13, L3ID 0 => COS0, RMID67 Core 14, L2ID 14, L3ID 0 => COS0, RMID68 Core 15, L2ID 15, L3ID 0 => COS0, RMID69 Core 16, L2ID 0, L3ID 0 => COS0, RMID70 Core 17, L2ID 1, L3ID 0 => COS0, RMID71 Core 18, L2ID 2, L3ID 0 => COS0, RMID72 Core 19, L2ID 3, L3ID 0 => COS0, RMID73 Core 20, L2ID 4, L3ID 0 => COS0, RMID74 Core 21, L2ID 5, L3ID 0 => COS0, RMID75 Core 22, L2ID 6, L3ID 0 => COS0, RMID88 Core 23, L2ID 7, L3ID 0 => COS0, RMID91 Core 24, L2ID 8, L3ID 0 => COS0, RMID108 Core 25, L2ID 9, L3ID 0 => COS0, RMID109 Core 26, L2ID 10, L3ID 0 => COS0, RMID110 Core 27, L2ID 11, L3ID 0 => COS0, RMID112 Core 28, L2ID 12, L3ID 0 => COS0, RMID113 Core 29, L2ID 13, L3ID 0 => COS0, RMID114 Core 30, L2ID 14, L3ID 0 => COS0, RMID115 Core 31, L2ID 15, L3ID 0 => COS7, RMID122

IteratorandIterator commented 3 months ago

Also, could you please make the same experiment using MSR interface? $ LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr

Absolutely, I'd be happy to conduct the experiment using the MSR interface as well !

First, I created a process with a read bandwidth of 10000MB by using [zlx@HM1:~/utils/RDT/intel-cmt-cat] $ sudo membw -c 31 -b 10000 --read The out put of command is "- THREAD logical core id: 31, memory bandwidth [MB]: 10000, starting…"

Then, I used [zlx@HM1:~/utils/RDT/intel-cmt-cat] $ sudo LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -a 'cos:7=31' The out put of command is " NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour Allocation configuration altered. "

sudo LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -e 'mba:7=10'. The out put of command is " NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour SOCKET 0 MBA COS7 => 10% requested, 10% applied Allocation configuration altered. "

Finally, I checked the process bandwidth with sudo LD_LIBRARY_PATH=lib ./pqos/pqos --iface=msr -m mbl:31 NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3

I have used tools like “top, ps aux | grep pqos, and pidof pqos” but was unable to find any running pqos, pqos-os or pqos-msr processes. I am also certain that I have not allowed any pqos or pqos-os to execute in the background.

rstorozh commented 3 months ago

ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3

I have used tools like “top, ps aux | grep pqos, and pidof pqos” but was unable to find any running pqos, pqos-os or pqos-msr processes. I am also certain that I have not allowed any pqos or pqos-os to execute in the background. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Typically this happen when process failed by some reason, but msr/resctrl settings hasn't been cleared. I would recommend restart the machine, repeat the experiment using only msr interface and provide the results, that is, if the bandwidth is throtthled.

rstorozh commented 3 months ago

Also please provide me the following info: 1) Version of the code that you use for building - 'master' branch or a particular tag. 2) OS version that you use 3) Kernel version that you use ($ uname -a)

IteratorandIterator commented 3 months ago

ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3

I have used tools like “top, ps aux | grep pqos, and pidof pqos” but was unable to find any running pqos, pqos-os or pqos-msr processes. I am also certain that I have not allowed any pqos or pqos-os to execute in the background. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Typically this happen when process failed by some reason, but msr/resctrl settings hasn't been cleared. I would recommend restart the machine, repeat the experiment using only msr interface and provide the results, that is, if the bandwidth is throtthled.

Thank you! Since the server is being used by multiple people, I will inform you of the test results immediately after I have discussed and agreed on a restart time with them.

IteratorandIterator commented 3 months ago

Also please provide me the following info:

  1. Version of the code that you use for building - 'master' branch or a particular tag.
  2. OS version that you use
  3. Kernel version that you use ($ uname -a)
  1. The version of the code is 'master' branch
  2. OS version is Ubuntu-20.04 Desktop
  3. Kernel version is 5.15.0-97-generic
rstorozh commented 3 months ago

OK, thanks for the information. And let me suggest how to address some issues you encountered:

ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< If the pqos utility was killed while monitoring, you should be able to force start monitoring by doing a monitoring reset with pqos -r No reboot should be required.

And please provide me with feedback when the results of the experiments with MSR are ready

IteratorandIterator commented 3 months ago

OK, thanks for the information. And let me suggest how to address some issues you encountered:

  • I see the error below in the output: API lock initialization error! Error initializing PQoS library! <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< That typically can be solved by removing the lockfile at “/var/lock/libpqos”.
  • The warning below appears in a few places too: WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< I would recommend to unmount resctrl before running experiment with the MSR interface.

ERROR: Monitoring on core 31 is already started Monitoring start error on core(s) 31, status 3 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< If the pqos utility was killed while monitoring, you should be able to force start monitoring by doing a monitoring reset with pqos -r No reboot should be required.

And please provide me with feedback when the results of the experiments with MSR are ready

Thanks! When I use --iface=msr, I can achieve the effect of limiting bandwidth, but it doesn't work when I use --iface=os. I don't know why this is the case.