oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

RAS counter information is inadequate #110

Closed eero-t closed 1 year ago

eero-t commented 1 year ago

While current RAS counter information can be useful for driver developers, IMHO it does not really suffice for managing cluster of devices.

This is because:


I think RAS counters should provide following information...

A) What impact/fatality the issue has i.e. what mitigations are required:

B) What caused the issue, i.e. what mitigations are required:

C) Counters for kernel level device usage issues:

eero-t commented 1 year ago

Moved this to spec project instead.