intel / intel-cmt-cat

User space software for Intel(R) Resource Director Technology
http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
Other
693 stars 183 forks source link

pqos does not work on AMD Zen2/Rome parts since v23.08 #273

Closed iximeow closed 2 months ago

iximeow commented 5 months ago

building and running pqos on a Rome CPU (EPYC 7662 in particular) results in:

DEBUG: Detected core 123, socket 0, NUMAnode 14, L2 ID 59, L3 ID 14, APICID 119
DEBUG: Detected core 124, socket 0, NUMAnode 15, L2 ID 60, L3 ID 15, APICID 121
DEBUG: Detected core 125, socket 0, NUMAnode 15, L2 ID 61, L3 ID 15, APICID 123
DEBUG: Detected core 126, socket 0, NUMAnode 15, L2 ID 62, L3 ID 15, APICID 125
DEBUG: Detected core 127, socket 0, NUMAnode 15, L2 ID 63, L3 ID 15, APICID 127
ERROR: RDMSR failed for reg[0xca0] on lcore 0
ERROR: Error reading SNC information!
ERROR: Error encounter in monitoring discovery!
ERROR: discover_capabilities() error 1
Error initializing PQoS library!

for v23.08 i can build and run pqos without issue, v23.11 and later (including current master) yield the above. this seems to have been the case since SNC support was added; these cores do not support the PQOS_MSR_SNC_CFG MSR and so the attempt to discover SNC support by reading MSR 0xCA0 via msr_read in hw_cap_mon_snc_state errors (linux returns EIO for the pread64).

that error causes an early return with PQOS_RETVAL_ERROR which errors with Error reading SNC information! from the caller hw_cap_mon_discover and onward until pqos exits.

i'd offer a patch, but i'm not immediately sure how to recover and in the face of missing SNC support. for now i'm proceeding with v23.08 to work around this.

rkanagar commented 3 months ago

HI @iximeow , Please provide a patch. We will look into that. Thanks.