Open HoppeMateusz opened 1 year ago
thanks @HoppeMateusz . There have been requests from customers to modify that behavior, and that actually multiple calls to zeInit to work.
Of course, in terms of the L0 GPU driver, that would imply a complex refactoring of code. But putting aside implementation details, i think the first question here to answer is:
what is the best behavior for customers:
It is not even possible to refactor the code, you would need to change the whole specification. If you allow mulitple zeInit with different variables, then you can have a scenario where within one process one library calls zeInit to use GPU , then another library calls ze init to use VPU only and this second call would invalidate all submissions in flight done by the first zeInit call. You would need to update all entry points to reflect that.
The reason why you have single initialization is to have single point in time where you set up the driver and all associated classes. if you allow this step to happen multiple times, you would create gigantic overhead as you would need to introduce many checks for thing that were immutable to see if they changed. This would sacrifice a lot of L0 efficiency and would create horribly complex driver implementation that wouldn't be maintainable in the long run.
The only way to really move forward and have efficiency is to update the spec that only first initialization is valid and subsequent ones are not updating anything.
thanks @MichalMrozek . The problem here is this:
The reason why you have single initialization is to have single point in time where you set up the driver and all associated classes.
In a multi-library application, there's no single point in time where to call zeInit. Imagine an HPC application with the following libraries:
Each of these may call zeInit(), each with different requirements. For instance, the profiling tool may need tools and tracing, but if the zeInit from SYCL comes first, then tools and tracing may not be used. Or you have the communication libraries or MPI using multiple ranks (processes), and some use CPU and other GPU, each initializing L0 differently. So the single point of entry actually becomes a data race, depending on which library loads first.
As you say, fully supporting that mode would provide an enormous overhead, so maybe something in middle could be provided. Maybe zeInit can allow for incremental initialization (e.g., if zeInit has initialized a GPU, then later it can initialize a CPU, but not remove the GPU), or maybe we can find other alternatives.
That's why zeInit shouldn't have any parameters and always expose all devices.
Incremental initialization is the same problem, it has enormous overhead as you cannot assume that some portions of driver are already initialized and will not change in future, if you need to assume that they may increment at any point of time, that's where you have additional overhead.
If you need to add some capabilities in the middle like tracing, this should be via new APIs, not via zeInit which is already heavily overloaded.
thanks @MichalMrozek . I agree with this:
this should be via new APIs, not via zeInit which is already heavily overloaded.
I think instead of relying on environment variables and flags passed to zeInit, we can have explicit APIs, so each component initializes what it needs. zeInit will take care only of general initialization, but other things could be taken care of by extra APIs.
Current description allows calling zeInit() multiple times with different environment variables. https://spec.oneapi.io/level-zero/latest/core/api.html#zeinit
It should be stated that calling zeInit() with the same flags and different Environment Variables will not have effect on Driver - as driver is initialized once
And that spec defined env vars (https://spec.oneapi.io/level-zero/latest/core/PROG.html#environment-variables) will only be honored at first initialization.