Expose sub-device exposed by ZE_AFFINITY_MASK as devices

jandres742 commented 2 years ago

From customer feedback:

Currently with a device with two sub-devices, following mask exposes the root device and bot sub-devices:

ZE_AFFINITY_MASK=0.0,0.1

Request is to have these exposed as two separate root devices. In other words, that each sub-device exposed in the mask is presented by Level Zero as a device, with no sub-devices.

jandres742 commented 2 years ago

When you use the affinity mask, we expose the parent device when at least 2 sub-devices are selected with the mask. From https://spec.oneapi.io/level-zero/latest/core/PROG.html?highlight=affinity#affinity-mask: See here how for a 4 sub-device system, when you have 1.3 and 1.0 in the mask, then we expose the root device and two subdevices for it (see below).

The following examples demonstrate proper usage for a system configuration of two devices, each with four sub-devices: • … • 0.2, 1.3, 1.0, 0.3: both device 0 and 1 are reported; device 0 reports sub-devices 2 and 3 as sub-devices 0 and 1, >respectively; device 1 reports sub-devices 0 and 3 as sub-devices 0 and 1, respectively; the order is unchanged.

Now, the reason we do that, instead of exposing 1.3 and 1.0 as separate devices is threefold:

Flexibility: a. It exposes everything to the application, letting it to decide what to use and what not. If the application wants to see each sub-device as a device, then middleware library (DPC++, OpenMP) or the application can use the sub-device handles, but if other application wants to use the hierarchy of root and sub-device handles, then it would be also available. Limiting to exposing sub-devices as devices always, would limit applications who want to see the hierarchy.
Implicit scaling: a. By exposing the root device, we allow for implicit scaling to be supported with a sub-set of tiles. In the sample below, we would have implicit scaling with the two-out-of-four tiles 1.3, and 1.0. The application then would decide whether to use the root device with a 2T implicit scaling, or just use the tiles directly. If we exposed each sub-device as a device, then implicit scaling wouldn’t’ be possible with a sub-set of tiles.
Scalability: a. In the future we could have further levels in the device hierarchy, with sub-devices inside sub-devices. In this case, it would become difficult to decide what a device is. Imagine the case where you have this:

1 root device
- 2 tiles
- Each tile with 4 sub-sub-devices.

Now imagine the user pass this mask:

MASK=0.0,0.1.2,0.1.3

In this case, if we exposed each as a sub-device, then we would have each device with a different set of capabilities, which may further complicate things. However, by exposing in this case

MASK=0.0,0.1.2,0.1.3 =>

root device handle 0
- sub device handle 0: representing 0.0
- sub-device handle 1: representing 0.1
- sub-device handle 0: representing 0.1.2
- sub-device handle 1: representing 0.1.3

it is clear, and easier for the application, to traverse the device hierarchy and understand what each device handle represents.

In this case, if the application, DPC++, or OpenMP, wants to see 0.0, 0.1.2, and 0.1.3, as separate devices, can do it by just selecting the right-most leaves in the trees, and if other application wants to see the whole hierarchy, and use implicit scaling, then it would use the device handle that they need.

Now, one proposal from customers is to either change the meaning of the affinity mask, or to define a new one, like ZE_VISIBLE_DEVICES, which allows for this model.

servesh commented 2 years ago

@jandres742 Would it make sense to be more pragmatic in the way root devices are shown to the programming layer above?

The current issue seems to stem from, "we expose the parent device when at least 2 sub-devices are selected with the mask"

My thinking here is,

MASK=0.0,0.1.2,0.1.3=>

sub device handle 0: representing 0.0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
- sub-device handle 0: representing 0.1.2
- sub-device handle 1: representing 0.1.3

MASK=0,0.0,0.1,0.1.2,0.1.3=>

root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory allocations should be split across the closest domain to these devices, i.e 0's global memory)
- sub device handle 0: representing 0.0
- sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
  - sub-device handle 0: representing 0.1.2
  - sub-device handle 1: representing 0.1.3

And if the application chooses a device handle with subdevice, then implicitly scale the workload across its subdevices.

jandres742 commented 2 years ago

thanks @servesh . I think what you are saying is the same as me, no? The way we have the affinity mask defined allows for allowing users to programmatically select the device handle in the hierarchy that fits their needs, depending on the mask passed. The behavior you showed in your example is exactly that. We would expose several device handles in the hierarchy, and do implicit scaling and color the allocations accordingly, and as you say, the application can programmatically select the handle it wants.

MASK=0,0.0,0.1,0.1.2,0.1.3=>

root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory > allocations should be split across the closest domain to these devices, i.e 0's global memory)

sub device handle 0: representing 0.0

sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)

sub-device handle 0: representing 0.1.2

sub-device handle 1: representing 0.1.3

If instead of that, we would expose each of this comma-separated masks as a single device, then no memory coloring nor implicit scaling would be possible.

That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

TApplencourt commented 2 years ago

That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

I agree. HavingZE_VISIBILE_DEVICES or another ENV, will be maybe more tractable. Look like both behaviors (the visibly and the masking) are needed.

Some users definitely want the same behavior as ROCR_VISIBLE_DEVICES. So not giving a mask, just an "expose was I pass you as a device".

So having 2 different ENV seems to be a good idea!

jandres742 commented 2 years ago

Moved to public spec repo we have now:

https://github.com/oneapi-src/level-zero-spec/issues/1

oneapi-src / level-zero

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #86