intel / intel-cmt-cat

User space software for Intel(R) Resource Director Technology
http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
Other
693 stars 183 forks source link

ERROR: Failed to create resctrl group /sys/fs/resctrl/COS14! #204

Closed shaorui1 closed 2 years ago

shaorui1 commented 2 years ago

Hi all, I'm trying to get the /sys/fs/resctrl interface working with intel-cmt-cat v4.3.0 (Docker container env).

Docker command:

$ docker run -it -v /sys/fs/resctrl:/sys/fs/resctrl appqos:latest bash

Here is the error message:

$ pqos -V

NOTE:  Mixed use of MSR and kernel interfaces to manage
       CAT or CMT & MBM may lead to unexpected behavior.
INFO: Requested interface: AUTO
INFO: resctrl detected
INFO: Selected interface: OS
INFO: CACHE: type 1, level 1, max id sharing this cache 2 (1 bits)
DEBUG: CACHE: not inclusive, direct mapped, 8 way(s), 64 set(s), line size 64, 1 partition(s)
INFO: CACHE: type 2, level 1, max id sharing this cache 2 (1 bits)
DEBUG: CACHE: not inclusive, direct mapped, 8 way(s), 64 set(s), line size 64, 1 partition(s)
INFO: CACHE: type 3, level 2, max id sharing this cache 2 (1 bits)
DEBUG: CACHE: not inclusive, direct mapped, 8 way(s), 512 set(s), line size 64, 1 partition(s)
INFO: CACHE: type 3, level 3, max id sharing this cache 64 (6 bits)
DEBUG: CACHE: inclusive, complex cache indexing, 20 way(s), 45056 set(s), line size 64, 1 partition(s)
DEBUG: Detected core 0, socket 0, L2 ID 0, L3 ID 0
DEBUG: Detected core 1, socket 0, L2 ID 1, L3 ID 0
DEBUG: Detected core 2, socket 0, L2 ID 2, L3 ID 0
DEBUG: Detected core 3, socket 0, L2 ID 3, L3 ID 0
DEBUG: Detected core 4, socket 0, L2 ID 4, L3 ID 0
DEBUG: Detected core 5, socket 0, L2 ID 5, L3 ID 0
DEBUG: Detected core 6, socket 0, L2 ID 8, L3 ID 0
DEBUG: Detected core 7, socket 0, L2 ID 9, L3 ID 0
DEBUG: Detected core 8, socket 0, L2 ID 10, L3 ID 0
DEBUG: Detected core 9, socket 0, L2 ID 11, L3 ID 0
DEBUG: Detected core 10, socket 0, L2 ID 12, L3 ID 0
DEBUG: Detected core 11, socket 0, L2 ID 16, L3 ID 0
DEBUG: Detected core 12, socket 0, L2 ID 17, L3 ID 0
DEBUG: Detected core 13, socket 0, L2 ID 18, L3 ID 0
DEBUG: Detected core 14, socket 0, L2 ID 19, L3 ID 0
DEBUG: Detected core 15, socket 0, L2 ID 20, L3 ID 0
DEBUG: Detected core 16, socket 0, L2 ID 21, L3 ID 0
DEBUG: Detected core 17, socket 0, L2 ID 24, L3 ID 0
DEBUG: Detected core 18, socket 0, L2 ID 25, L3 ID 0
DEBUG: Detected core 19, socket 0, L2 ID 26, L3 ID 0
DEBUG: Detected core 20, socket 0, L2 ID 27, L3 ID 0
DEBUG: Detected core 21, socket 0, L2 ID 28, L3 ID 0
DEBUG: Detected core 22, socket 1, L2 ID 32, L3 ID 1
DEBUG: Detected core 23, socket 1, L2 ID 33, L3 ID 1
DEBUG: Detected core 24, socket 1, L2 ID 34, L3 ID 1
DEBUG: Detected core 25, socket 1, L2 ID 35, L3 ID 1
DEBUG: Detected core 26, socket 1, L2 ID 36, L3 ID 1
DEBUG: Detected core 27, socket 1, L2 ID 37, L3 ID 1
DEBUG: Detected core 28, socket 1, L2 ID 40, L3 ID 1
DEBUG: Detected core 29, socket 1, L2 ID 41, L3 ID 1
DEBUG: Detected core 30, socket 1, L2 ID 42, L3 ID 1
DEBUG: Detected core 31, socket 1, L2 ID 43, L3 ID 1
DEBUG: Detected core 32, socket 1, L2 ID 44, L3 ID 1
DEBUG: Detected core 33, socket 1, L2 ID 48, L3 ID 1
DEBUG: Detected core 34, socket 1, L2 ID 49, L3 ID 1
DEBUG: Detected core 35, socket 1, L2 ID 50, L3 ID 1
DEBUG: Detected core 36, socket 1, L2 ID 51, L3 ID 1
DEBUG: Detected core 37, socket 1, L2 ID 52, L3 ID 1
DEBUG: Detected core 38, socket 1, L2 ID 53, L3 ID 1
DEBUG: Detected core 39, socket 1, L2 ID 56, L3 ID 1
DEBUG: Detected core 40, socket 1, L2 ID 57, L3 ID 1
DEBUG: Detected core 41, socket 1, L2 ID 58, L3 ID 1
DEBUG: Detected core 42, socket 1, L2 ID 59, L3 ID 1
DEBUG: Detected core 43, socket 1, L2 ID 60, L3 ID 1
INFO: resctrl detected
INFO: Monitoring capability detected
INFO: L3CA capability detected
INFO: L3 CAT details: CDP support=1, CDP on=0, #COS=16, #ways=20, ways contention bit-mask 0xc0000
INFO: L3 CAT details: cache size 57671680 bytes, way size 2883584 bytes
INFO: L2CA capability not detected
INFO: MBA capability not detected
DEBUG: resctrl group COS1 detected
DEBUG: resctrl group COS2 detected
DEBUG: resctrl group COS3 detected
DEBUG: resctrl group COS4 detected
DEBUG: resctrl group COS5 detected
DEBUG: resctrl group COS6 detected
DEBUG: resctrl group COS7 detected
DEBUG: resctrl group COS8 detected
DEBUG: resctrl group COS9 detected
DEBUG: resctrl group COS10 detected
DEBUG: resctrl group COS11 detected
DEBUG: resctrl group COS12 detected
DEBUG: resctrl group COS13 detected
ERROR: Failed to create resctrl group /sys/fs/resctrl/COS14!
ERROR: OS allocation init error!
Error initializing PQoS library!

Here is some information about the machine:

$  cat /proc/cpuinfo
......
processor       : 43
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
stepping        : 1
microcode       : 0xb000020
cpu MHz         : 1801.000
cache size      : 56320 KB
physical id     : 1
siblings        : 22
core id         : 28
cpu cores       : 22
apicid          : 120
initial apicid  : 120
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
bogomips        : 4395.75
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-1160.31.1.el7.x86_64 root=UUID=xxxx ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll irqaffinity=0 default_hugepagesz=1G hugepagesz=1G hugepages=16 nosoftlockup nmi_watchdog=0 audit=0 selinux=0 enforcing=0 kthread_cpus=0 rdt=mbmtotal,mbmlocal,l3cat,l3cdp skew_tick=1 isolcpus=2-43 intel_pstate=disable nosoftlockup nohz=on nohz_full=2-43 rcu_nocbs=2-43

I have read this wiki page and it says 'The number of unique COS configurations available for this resource (L3/L2/CDP). The kernel uses the smallest number of COS for all enabled resources as the limit (e.g. 8). “cbm_mask”:', Should I change the value here to fix the problem? Also, what is the reason for not being able to create COS here?

aleksinx commented 2 years ago

Resctrl limits number for control groups to number got L3 CAT classes. PQoS during initialization creates folder in /sys/fs/resctrl for each available control group.

This issue might occur when control group was manually created in /sys/fs/resctrl. Please paste output of the following command

ls -l /sys/fs/resctrl
shaorui0 commented 2 years ago

ls -l /sys/fs/resctrl @aleksinx

total 0
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS1
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS10
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS11
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS12
drwxr-xr-x 4 root root 0 Jan 13 09:52 COS13
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS2
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS3
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS4
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS5
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS6
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS7
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS8
drwxr-xr-x 4 root root 0 Jan 13 09:41 COS9
-rw-r--r-- 1 root root 0 Jan 13 09:40 cpus
-rw-r--r-- 1 root root 0 Jan 13 09:40 cpus_list
dr-xr-xr-x 4 root root 0 Jan 13 09:40 info
dr-xr-xr-x 4 root root 0 Jan 13 09:40 mon_data
dr-xr-xr-x 2 root root 0 Jan 13 09:40 mon_groups
drwxr-xr-x 4 root root 0 Jan 13 09:40 p0
drwxr-xr-x 4 root root 0 Jan 13 09:40 p1
-rw-r--r-- 1 root root 0 Jan 13 09:40 schemata
-rw-r--r-- 1 root root 0 Jan 13 09:40 tasks
aleksinx commented 2 years ago

In log there are p0 and p1 control groups present. If you created them manually simply unmount /sys/fs/resctrl

Are you using other software for managing RDT configuration? For example Resource Management Daemon is known to not be compatible with PQoS.

kmabbasi commented 2 years ago

No response for last 3 weeks. Going to close this issue by now.

Feel free to reopen, if issue persists.

Thanks, Khawar