intel / intel-cmt-cat

User space software for Intel(R) Resource Director Technology
http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
Other
693 stars 183 forks source link

Pqos always seems to need -r flag. Is that normal? #181

Closed krisoft-oxbotica closed 3 years ago

krisoft-oxbotica commented 3 years ago

intel-cmt-cat version: 4.1 OS: Ubuntu 18.04.5 LTS (running inside docker) kernel version: 4.15.0-96-generic CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz

We noticed that when we run the pqos cli we get an error and then the cli terminates:

$ sudo pqos -u csv -t 5
Monitoring start error on core(s) 0, status 3

But when we add the -r flag everything seems to work fine:

$ sudo pqos -r -u csv -t 5
NOTE:  Mixed use of MSR and kernel interfaces to manage
       CAT or CMT & MBM may lead to unexpected behavior.
CMT/MBM reset successful
Time,Core,IPC,LLC Misses,LLC[KB],MBL[MB/s],MBR[MB/s]
2021-04-08 16:21:31,"0",0.36,149,0.0,0.0,0.0
2021-04-08 16:21:31,"1",0.34,180,640.0,0.0,0.0
[...]

We believe this is the only pqos instance running on the computer.

Is this behaviour expected? If not, why might it be happening? Should we foresee problems if we just always run with the -r flag? Are there any traps/contraindications we should be aware of?

kmabbasi commented 3 years ago

Hi,

In general, this error occurs when the core is already being monitored. Resetting pqos i.e. pqos -r resets all RMIDs to default. Pqos is not the only tool that supports monitoring. Tools like PCM could also be used to monitor resources.

Also if resctrl is mounted on the host that might program RMID and causes this issue.

Thanks, Khawar

krisoft-oxbotica commented 3 years ago

Thank you for the quick response Khawar.

Sorry for taking long with my response, we had to double check things.

We don't have resctrl mounted, nor do we have PCM installed.

We have a metrics monitoring memory information based on vmstat. But that doesn't seem to use MSR (at least from what I can tell.)

Would we get this kind of error perhaps if the previous pqos monitoring the memory doesn't shut down cleanly? Just wondering what other kind of things we can check.

kmabbasi commented 3 years ago

Yes, if nothing else is running in the background than probably it hasn't shutdown pqos properly.

Thanks, Khawar

kmabbasi commented 3 years ago

I am going to close this issue by considering that you got your answer.

Feel free to reopen or create new issue if needed.

Thanks, Khawar