intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

Releases with zesInit() implemented or unsupported #650

Closed bgoglin closed 2 months ago

bgoglin commented 1 year ago

zesInit() might be the right way to finally fix oneapi-src/level-zero#36 but current releases provide zesInit() without actually implementing it (it returns "unsupported feature" in 26032.30).

I know we are supposed to handle API errors, but this means we should do a runtime check at configure time before enabling zesInit() in our code. We can't do a runtime-check at runtime because it would be too late to fallback to setting ZES_ENABLE_SYSMAN=1 (has to be done very early in the process, ideally inside a library constructor).

When are you planning to publish a compute-runtime release with proper support for zesInit()? It seems to in the git tree already.

Could you explictly list which releases have a unsupported zesInit() ? We are going to tell users to avoid these releases. Either they use newer releases with zesInit(). Or they use older releases without zesInit() at all and we'll set ZES_ENABLE_SYSMAN=1 in the library constructor.

saik-intel commented 1 year ago

@bgoglin zesInitbased implementation is not fully complete and we are working on it and please let me know your test / reproduction step where you are seeing unsupported , will investigate and update you

bgoglin commented 1 year ago

I installed packages from the latest compute-runtime release (https://github.com/intel/compute-runtime/releases/tag/23.13.26032.30) on a Debian bookworm running Linux kernel 6.1. This is a laptop with an old Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz. Test program is just does zesInit(0) and prints the return value. I get 0x78000003 = ZE_RESULT_ERROR_UNSUPPORTED_FEATURE

saik-intel commented 1 year ago

thanks @bgoglin we will look into it and update you the fix

bgoglin commented 1 year ago

I just discovered that zesInit() returns 0 instead of 0x78000003 if ZES_ENABLE_SYSMAN=1 is set in the environment. If took me quite a bit of time to understand why things weren't failing anymore, so I am writing it here in case it helps you reproduce the issue.

bgoglin commented 8 months ago

Looks like zesInit() works now? Using these packages: intel-igc-core_1.0.14828.8_amd64.deb intel-igc-opencl_1.0.14828.8_amd64.deb intel-level-zero-gpu_1.3.26918.9_amd64.deb intel-opencl-icd_23.30.26918.9_amd64.deb libigdgmm12_22.3.0_amd64.deb It didn't with those: intel-igc-core_1.0.14062.11_amd64.deb intel-igc-opencl_1.0.14062.11_amd64.deb intel-level-zero-gpu_1.3.26516.18_amd64.deb intel-opencl-icd_23.22.26516.18_amd64.deb libigdgmm12_22.3.0_amd64.deb If you know which package and version first brought the fix, I'd like to know it so that we tell people avoid earlier releases.

bgoglin commented 8 months ago

From git log, I'd say it's this commit:

commit 73b40a49dcd1b320309cdefd24a2a2b23149ee01
Author: Kulkarni, Ashwin Kumar <ashwin.kumar.kulkarni@intel.com>
Date:   Wed Jun 7 10:10:13 2023 +0000

    fix(sysman): Enables zesInit flow without setting ZES_ENABLE_SYSMAN

    Related-To: LOCI-4458

which first appeared in tag 23.26.26690.12.

saik-intel commented 7 months ago

@bgoglin can we close this issue now? if zesInit is working for you ?

bgoglin commented 7 months ago

The question was rather: Could you explictly list which releases have a unsupported zesInit() ? We are going to tell users to avoid these releases. Either they use newer releases with zesInit(). Or they use older releases without zesInit() at all and we'll set ZES_ENABLE_SYSMAN=1 in the library constructor.

saik-intel commented 7 months ago

please start using new releases with zesInit ()

eero-t commented 4 months ago

The question was rather: Could you explictly list which releases have a unsupported zesInit() ? We are going to tell users to avoid these releases. Either they use newer releases with zesInit(). Or they use older releases without zesInit() at all and we'll set ZES_ENABLE_SYSMAN=1 in the library constructor.

please start using new releases with zesInit ()

@saik-intel compute-runtime project releases multiple versions in parallel, and customers may get old version of it from distros. You did not answer the question or the subject of this ticket; from which releases onward there's (full) zesInit() support?

eero-t commented 4 months ago

zesInit() stub was introduced on Feb 2023 in commit bf481e61034.

From git log, I'd say it's this commit:

commit 73b40a49dcd1b320309cdefd24a2a2b23149ee01
Author: Kulkarni, Ashwin Kumar <ashwin.kumar.kulkarni@intel.com>
Date:   Wed Jun 7 10:10:13 2023 +0000

    fix(sysman): Enables zesInit flow without setting ZES_ENABLE_SYSMAN

which first appeared in tag 23.26.26690.12.

When looking at last tag series preceding the 23.26.x one:

 $ for tag in $(git tag); do echo "$tag: $(git show --format=fuller $tag|grep ^CommitDate)"; done | grep 2023
...
23.22.26516.33: CommitDate: Tue Dec 12 18:45:47 2023 +0100
23.22.26516.34: CommitDate: Tue Dec 19 11:15:15 2023 +0100
23.22.26516.8: CommitDate: Wed Jun 14 11:01:51 2023 +0200
23.26.26690.12: CommitDate: Tue Jul 25 14:34:41 2023 +0200
23.26.26690.13: CommitDate: Tue Jul 25 16:26:36 2023 +0200
...

23.22.26516.34 does not contain the above indicated commit, only 23.26.x series and ones after it do.

=> anything with tag number smaller than 23.26.26690.12 needs to be avoided for zesInit() to work as expected.

eero-t commented 4 months ago

@bgoglin Does above answer your question i.e. can this be closed?

bgoglin commented 4 months ago

"anything with tag number smaller than 23.26.26690.12", do you also mean released in 2023 ? (equivalent to starting with "23." ?)

eero-t commented 4 months ago

Looking at the Git content... While L0 1.5 i.e. zesdInit() support is already in earlier versions, last tag in a preceding 23.22.x series (from Dec 2023) did not include 73b40a49dcd1b320309cdefd24a2a2b23149ee01 commit, and last tags in series before that, are all before that commit was made, and check the env variable.

saik-intel commented 2 months ago

@bgoglin can we close this issue now ?

eero-t commented 2 months ago

@saik-intel While relevant backend version is now known, i.e. this can be closed, it still requires manual setup outside the L0 application itself.

Proper support would require also L0 frontend (init return values) to be fixed, so that clients can differentiate between total backend failure (due to missing HW or wrong kernel driver), and missing backend functionality (due to too old backend version). See: https://github.com/oneapi-src/level-zero/issues/140

bgoglin commented 2 months ago

Agreed, this issue can be closed but user code is still hard to maintain.