Closed rahulcse03 closed 7 years ago
Hi rahulcse03,
It looks like your kernel is below version 4.10, You have two options here,
Best, Aaron
Thanks Aaron, Kernel update to 4.12 has done the magic. Thanks. :) Removing -I does not help. In case helps someone:
$uname -sr $ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-headers-4.12.0-041200_4.12.0-041200.201707022031_all.deb $ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-headers-4.12.0-041200-generic_4.12.0-041200.201707022031_amd64.deb $ wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-image-4.12.0-041200-generic_4.12.0-041200.201707022031_amd64.deb Post download of above kernel files, install them as follows: $ sudo dpkg -i *.deb Reboot machine. Verify new kernel version being used: $ reboot
check version again: $ uname -sr
In case still not working, use below manual load. $modprobe msr
No problem rahulcse03,
If your happy with the answer feel free to close this issue or someone from Intel will do it later :)
Best, Aaron
Thanks Aaron. Closing it.
Your welcome,
Just to be clear here, I can see your attempting to monitor a PID by using the -p option,
PID monitoring can only be done using the -I -p options together, therefore PID monitoring can only happen on Kernel 4.10+ (as you have seen)
PQoS can monitor on a per core basis on Kernels >4.10 ,(e.g #sudo pqos ) so the -p option will not work on older kernels.
I hope this clears a few things up,
Best, Aaron
Little confusion. Lemme share with you what i am trying to achieve. Creating a Class Of Service COS1, Assigning 40 cores to that COS 1. Then attaching a process to that COS 1. Then monitor that using collectd. (Basically simulating noisy neighbor example, not on multi VMs, but using various processes in Server)
Am I correct with it? Also not sure what this bitmask actually means (0x000f, 0x0ff0, 0xfffff). I want to allocate cores independently for a process(isolation and not overlapping), so my process uses a fixed ampunt of L3 Cache.
Hi rahulcse03,
OK, what the commands above has done is, you have overlapped the 40 cores and the PID 5377 into the exact same cache space. If you want isolation and non overlapping then you can change your command 3 to put the PID on COS2 "pqos -I -a "pid:2=5377"" This will mean that the 40 cores and the PID do not share any cache space in the example you have given,
The second part of your question, understanding the bitmask. As you probably know already, last level cache is divided into cache ways. The bitmask is a representation of these cache ways, so, for example, if the maxim bitmask is 0xfffff then the number of cache ways is 20. You can then assign which cache ways are usable (open) for a given COS, so a few examples are: 0x0000f = cache ways 1-4 are open 0x00ff0 = cache ways 5-12 are open 0xfffff = all 20 cache ways are open
By assigning different cache ways to COS you can isolate your noisy neighbor from your high priority application.
Let me know if my explanation didnt make sense!
Best, Aaron
Thanks Aaron for your time and detailed explanation. In order to understand it better; I am not able to achieve what i want from it. As I created a COS3, then attached it to core 3 and 6 processes exclusively. This is achieved with below commands:
1. Assign 1-4 cache ways to COS3 pqos -I -e "llc:3=0x0000f"
2. Assign Core3 to COS3 pqos -I -a "llc:3=3"
3. Assign process IDs to COS3 pqos -I -a "pid:3=7731-7736"
4. Monitor Processes or core : pqos -I -p all:7731-7736 OR pqos -m all:3 OUTPUT: TIME 2017-10-13 14:32:56 CORE IPC MISSES LLC[KB] MBL[MB/s] MBR[MB/s] 3 0.22 6k 280.0 0.1 0.0
Post this i wanted to see slowness in monitored processes (which are running nginx rtmp server and streaming video). So i have stressed the server with: stress-ng --matrix 0 --matrix-size 4096
But this is not helping me in starving the video display for L3 cache. Any idea, what miracle I can do in here? This would help me in getting practical understanding with L3 allocation.
Not sure, is it because I have huge server and its not impacting my streaming? ## Detail of my server: root@ossesx27:/var/mp4s# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz Stepping: 1 CPU MHz: 1200.000 CPU max MHz: 2600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5197.44 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840K NUMA node0 CPU(s): 0-13,28-41 NUMA node1 CPU(s): 14-27,42-55 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
So server is having 35MB L3 cache.
Cache ways I am not much aware of, Can you refer to some link I can refer for understanding?
Cheers, Rahul
Hi Rahul,
Firstly, some info on cache ways: https://www.d.umn.edu/~gshute/arch/cache-addressing.xhtml This link is the best I could quickly find. Each "TAG" and "Data" pairing is a "Cache Way".
OK, so just to be clear. You want to run a video stream, then you want to run a noisy neighbor, then observe that the stream has slowed down, then isolate the noisy neighbor and observe that the video stream has returned to normal.
From above it looks like you are on the right track but are not getting your expected results. You have given your video streaming app and stress-ng 4 cache ways. Your cache has 35MB of space and 20 cache ways. So that is 1.75MB per cache way (35MB / 20) with 4 cache ways being 7MB (1.74 *4). That is a lot of cache space to share, and stress-ng may not be big enough to affect the video.
I can suggest two quick options here,
Let me know how you get on!
Best, Aaron
No outcome. nothing working out. :-1: tried many possible stuff.
Can you share what you have tried and the results please? What stats are you looking for to show the nosy neighbor?
Hi Rahul,
A few things to be aware of here:
pqos -I -p all:7731-7736
Suggestions:
Regards, Marcel
Results are showing nowhere cache is coming in control. Correct me, I think some monitorable performance degradation in performance should be seen if I just assign 1 cache way to COS0 and all cores are default with COS0?
Somethin I tried:
Still no outcome. I don't even see L3 cache getting restricted.
whats main difference between pqos and pqos-os (apart from one is using os interface like -I option); If i run both commands output is quite different. I will share maximum possible detail of my server, check if you can find some way of achieving noisy neighbor. (Have tried using taskset to attach a process to a core as well)
SERVER DETAIL:
TIME 2017-10-23 09:48:15 CORE IPC MISSES LLC[KB] MBL[MB/s] MBR[MB/s] 0 0.65 13k 5712.0 0.0 0.5 1 0.19 3k 616.0 0.0 0.0 2 0.19 3k 392.0 0.0 0.0 3 0.19 3k 168.0 0.0 0.0 4 0.19 2k 112.0 0.0 0.0 5 0.20 2k 56.0 0.0 0.0 6 0.19 1k 0.0 0.0 0.0 7 0.22 2k 0.0 0.0 0.0 8 0.20 3k 56.0 0.1 0.0 9 0.20 2k 0.0 0.0 0.0 10 0.20 3k 56.0 0.0 0.0 11 0.20 2k 0.0 0.0 0.0 12 0.20 4k 56.0 0.0 0.0 13 0.20 4k 0.0 0.0 0.0 14 0.24 11k 280.0 0.0 0.0 15 0.24 9k 168.0 0.0 0.0 16 0.25 11k 56.0 0.0 0.0 17 0.25 8k 392.0 0.1 0.0 18 0.24 9k 56.0 0.1 0.1 19 0.24 8k 1064.0 0.1 0.1 20 0.24 8k 112.0 0.0 5.5 21 0.24 9k 336.0 0.0 0.0 22 0.24 6k 56.0 0.1 0.0 23 0.23 8k 0.0 0.1 0.0 24 0.23 5k 560.0 0.1 0.0 25 0.24 7k 56.0 0.1 0.0 26 0.21 9k 0.0 0.0 0.0 27 0.24 7k 0.0 0.1 0.0 28 0.22 5k 0.0 0.0 0.0 29 0.20 3k 0.0 0.0 0.0 30 0.21 5k 56.0 0.1 0.0 31 0.21 5k 112.0 0.0 0.0 32 0.20 3k 168.0 0.0 0.0 33 0.32 5k 504.0 0.1 0.1 34 0.19 3k 168.0 0.0 0.0 35 0.20 3k 56.0 0.0 0.0 36 0.19 2k 112.0 0.0 0.0 37 1.16 17k 1624.0 1.0 1.0 38 0.43 8k 784.0 0.2 0.0 39 0.22 3k 0.0 0.0 0.0 40 0.21 4k 56.0 0.0 0.0 41 0.20 5k 280.0 0.1 0.0 42 0.23 8k 0.0 0.1 0.0 43 0.62 10k 168.0 0.1 0.0 44 0.22 7k 0.0 0.1 0.0 45 0.25 7k 56.0 0.1 0.0 46 0.24 7k 168.0 0.0 0.0 47 0.26 10k 2184.0 0.6 0.0 48 0.23 7k 56.0 0.0 0.0 49 0.25 7k 0.0 0.0 0.0 50 0.24 6k 56.0 0.1 0.0 51 0.22 6k 5544.0 0.0 0.0 52 0.22 5k 336.0 0.1 0.0
TIME 2017-10-23 09:49:14 CORE IPC MISSES LLC[KB] MBL[MB/s] MBR[MB/s] 0 0.81 25k 12320.0 13.7 3.0 1 0.21 5k 12320.0 13.7 3.0 2 0.19 5k 12320.0 13.7 3.0 3 0.19 5k 12320.0 13.7 3.0 4 1.11 11k 12320.0 13.7 3.0 5 0.19 3k 12320.0 13.7 3.0 6 0.19 2k 12320.0 13.7 3.0 7 0.19 3k 12320.0 13.7 3.0 8 0.20 4k 12320.0 13.7 3.0 9 0.21 4k 12320.0 13.7 3.0 10 0.20 4k 12320.0 13.7 3.0 11 0.20 4k 12320.0 13.7 3.0 12 0.21 7k 12320.0 13.7 3.0 13 0.19 4k 12320.0 13.7 3.0 14 0.29 17k 18592.0 4.0 16.5 15 0.24 12k 18592.0 4.0 16.5 16 0.24 11k 18592.0 4.0 16.5 17 0.27 12k 18592.0 4.0 16.5 18 0.24 10k 18592.0 4.0 16.5 19 0.24 11k 18592.0 4.0 16.5 20 0.25 11k 18592.0 4.0 16.5 21 0.25 16k 18592.0 4.0 16.5 22 0.25 9k 18592.0 4.0 16.5 23 0.25 12k 18592.0 4.0 16.5 24 0.25 7k 18592.0 4.0 16.5 25 0.24 10k 18592.0 4.0 16.5 26 0.24 8k 18592.0 4.0 16.5 27 0.27 9k 18592.0 4.0 16.5 28 0.21 6k 12320.0 13.7 3.0 29 0.20 4k 12320.0 13.7 3.0 30 0.19 6k 12320.0 13.7 3.0 31 0.20 5k 12320.0 13.7 3.0 32 0.21 3k 12320.0 13.7 3.0 33 0.24 5k 12320.0 13.7 3.0 34 0.20 4k 12320.0 13.7 3.0 35 0.20 3k 12320.0 13.7 3.0 36 0.20 3k 12320.0 13.7 3.0 37 0.20 4k 12320.0 13.7 3.0 38 0.20 7k 12320.0 13.7 3.0 39 0.19 4k 12320.0 13.7 3.0 40 0.20 5k 12320.0 13.7 3.0 41 0.19 6k 12320.0 13.7 3.0 42 0.25 10k 18592.0 4.0 16.5 43 0.59 14k 18592.0 4.0 16.4 44 0.27 10k 18592.0 4.0 16.4 45 0.26 9k 18592.0 4.0 16.4 46 0.41 25k 18592.0 4.0 16.4 47 0.53 26k 18592.0 4.0 16.4 48 0.26 10k 18592.0 4.0 16.4 49 0.26 9k 18592.0 4.0 16.4 50 0.25 8k 18592.0 4.0 16.4 51 0.30 10k 18592.0 4.0 16.4 52 0.30 10k 18592.0 4.0 16.4
NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. L3CA COS definitions for Socket 0: L3CA COS0 => MASK 0xfffff L3CA COS1 => MASK 0xfffff L3CA COS2 => MASK 0xfffff L3CA COS3 => MASK 0xfffff L3CA COS4 => MASK 0xfffff L3CA COS5 => MASK 0xfffff L3CA COS6 => MASK 0xfffff L3CA COS7 => MASK 0xfffff L3CA COS8 => MASK 0xfffff L3CA COS9 => MASK 0xfffff L3CA COS10 => MASK 0xfffff L3CA COS11 => MASK 0xfffff L3CA COS12 => MASK 0xfffff L3CA COS13 => MASK 0xfffff L3CA COS14 => MASK 0xfffff L3CA COS15 => MASK 0xfffff L3CA COS definitions for Socket 1: L3CA COS0 => MASK 0xfffff L3CA COS1 => MASK 0xfffff L3CA COS2 => MASK 0xfffff L3CA COS3 => MASK 0xfffff L3CA COS4 => MASK 0xfffff L3CA COS5 => MASK 0xfffff L3CA COS6 => MASK 0xfffff L3CA COS7 => MASK 0xfffff L3CA COS8 => MASK 0xfffff L3CA COS9 => MASK 0xfffff L3CA COS10 => MASK 0xfffff L3CA COS11 => MASK 0xfffff L3CA COS12 => MASK 0xfffff L3CA COS13 => MASK 0xfffff L3CA COS14 => MASK 0xfffff L3CA COS15 => MASK 0xfffff Core information for socket 0: Core 0, L2ID 0, L3ID 0 => COS0 Core 1, L2ID 1, L3ID 0 => COS0 Core 2, L2ID 2, L3ID 0 => COS0 Core 3, L2ID 3, L3ID 0 => COS0 Core 4, L2ID 4, L3ID 0 => COS0 Core 5, L2ID 5, L3ID 0 => COS0 Core 6, L2ID 6, L3ID 0 => COS0 Core 7, L2ID 8, L3ID 0 => COS0 Core 8, L2ID 9, L3ID 0 => COS0 Core 9, L2ID 10, L3ID 0 => COS0 Core 10, L2ID 11, L3ID 0 => COS0 Core 11, L2ID 12, L3ID 0 => COS0 Core 12, L2ID 13, L3ID 0 => COS0 Core 13, L2ID 14, L3ID 0 => COS0 Core 28, L2ID 0, L3ID 0 => COS0 Core 29, L2ID 1, L3ID 0 => COS0 Core 30, L2ID 2, L3ID 0 => COS0 Core 31, L2ID 3, L3ID 0 => COS0 Core 32, L2ID 4, L3ID 0 => COS0 Core 33, L2ID 5, L3ID 0 => COS0 Core 34, L2ID 6, L3ID 0 => COS0 Core 35, L2ID 8, L3ID 0 => COS0 Core 36, L2ID 9, L3ID 0 => COS0 Core 37, L2ID 10, L3ID 0 => COS0 Core 38, L2ID 11, L3ID 0 => COS0 Core 39, L2ID 12, L3ID 0 => COS0 Core 40, L2ID 13, L3ID 0 => COS0 Core 41, L2ID 14, L3ID 0 => COS0 Core information for socket 1: Core 14, L2ID 16, L3ID 1 => COS0 Core 15, L2ID 17, L3ID 1 => COS0 Core 16, L2ID 18, L3ID 1 => COS0 Core 17, L2ID 19, L3ID 1 => COS0 Core 18, L2ID 20, L3ID 1 => COS0 Core 19, L2ID 21, L3ID 1 => COS0 Core 20, L2ID 22, L3ID 1 => COS0 Core 21, L2ID 24, L3ID 1 => COS0 Core 22, L2ID 25, L3ID 1 => COS0 Core 23, L2ID 26, L3ID 1 => COS0 Core 24, L2ID 27, L3ID 1 => COS0 Core 25, L2ID 28, L3ID 1 => COS0 Core 26, L2ID 29, L3ID 1 => COS0 Core 27, L2ID 30, L3ID 1 => COS0 Core 42, L2ID 16, L3ID 1 => COS0 Core 43, L2ID 17, L3ID 1 => COS0 Core 44, L2ID 18, L3ID 1 => COS0 Core 45, L2ID 19, L3ID 1 => COS0 Core 46, L2ID 20, L3ID 1 => COS0 Core 47, L2ID 21, L3ID 1 => COS0 Core 48, L2ID 22, L3ID 1 => COS0 Core 49, L2ID 24, L3ID 1 => COS0 Core 50, L2ID 25, L3ID 1 => COS0 Core 51, L2ID 26, L3ID 1 => COS0 Core 52, L2ID 27, L3ID 1 => COS0 Core 53, L2ID 28, L3ID 1 => COS0 Core 54, L2ID 29, L3ID 1 => COS0 Core 55, L2ID 30, L3ID 1 => COS0 PID association information: COS1 => (none) COS2 => (none) COS3 => (none) COS4 => (none) COS5 => (none) COS6 => (none) COS7 => (none) COS8 => (none) COS9 => (none) COS10 => (none) COS11 => (none) COS12 => (none) COS13 => (none) COS14 => (none) COS15 => (none)
Hi Rahul,
pqos-os & pqos-msr are just wrappers to make it clear what interface is being used. As mentioned in our documentation, core monitoring is not currently functional with the OS interface and reports invalid values. This will be fixed in a future kernel.
Can you try a simple CMT/CAT test with the MSR interface to ensure everything is working as it should.
umount resctrl
pqos-msr -R -r
pqos-msr -e llc:1=0x1
pqos-msr -a llc:1=1
taskset -c 1 memtester 100M
pqos-msr
Here you should see core 1 with LLC occupancy of ~1.75MB (size of 1 LLC way)pqos-msr -e llc:1=0xf
pqos-msr -e llc:1=0xff
pqos-msr -e llc:1=0xfff
Something to note, we start with 1 way and increase size because when we start with access to a large portion of cache and decrease the size, we don't always see occupancy drop. This is because the data is still resident in the LLC. Occupancy will only decrease when another application (e.g. NN) causes the data to be evicted.
Let me know if this behaves as expected.
Regards, Marcel
I am trying to monitor a process. Getting below mentioned error.
sudo pqos -I -V -s
INFO: Monitoring capability detected INFO: CPUID.0x7.0: L3 CAT supported INFO: CDP is enabled INFO: L3 CAT details: CDP support=1, CDP on=1, #COS=8, #ways=20, ways contention bit-mask 0xc0000 INFO: L3 CAT details: cache size 36700160 bytes, way size 1835008 bytes INFO: L3CA capability detected INFO: CPUID 0x10.0: L2 CAT not supported! INFO: L2CA capability not detected INFO: CPUID 0x10.0: MBA not supported! INFO: MBA capability not detected INFO: resctrl not detected. Kernel version 4.10 or higher required ERROR: OS interface selected but not supported ERROR: discover_os_capabilities() error 1 Error initializing PQoS library!
root@oss7:~# sudo pqos -I -p all:19017 NOTE: Mixed use of MSR and kernel interfaces to manage CAT or CMT & MBM may lead to unexpected behavior. ERROR: OS interface selected but not supported ERROR: discover_os_capabilities() error 1 Error initializing PQoS library!
# Server Model: model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz