oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

Create Global Sysman API to collect GPU dump (logs) #106

Closed saik-intel closed 1 year ago

saik-intel commented 1 year ago

As L0 Syman is admin and it should have flexible to provide an API for end user to give GPU dump/crashlog and collect debug information/ logs. In a scenario like in multi GPU connected platform, if one of GPU goes hung and user would like to diagnosis for hung or get gpu dump logs will help to analysis purpose.

This might implement at spec 2.0 level and Sysman has to completely de-link from L0 core and this API should work without zeInit() as zeInit has limitation of require device handle and which may not be possible in case of GPU hung. Sysman use PCI BDF info and access to sysfs directly ang get the detailed log in the new API (which could be global API)