This is for consistency with the pytato context and for performance: if all that's needed is a device scalar, there's no real need for the device/host/device round-trip.
Unfortunately, that's a rather larger compatibility break. One way to do this would be to introduce a flag on the CL array context that changes the behavior, and whose old default value is deprecated. The pytato array context kind of leads the charge here anyway: it requires that to_numpy is called on lazy scalars before they're "normally" usable.
This is for consistency with the pytato context and for performance: if all that's needed is a device scalar, there's no real need for the device/host/device round-trip.
Unfortunately, that's a rather larger compatibility break. One way to do this would be to introduce a flag on the CL array context that changes the behavior, and whose old default value is deprecated. The pytato array context kind of leads the charge here anyway: it requires that
to_numpy
is called on lazy scalars before they're "normally" usable.cc @kaushikcfd @matthiasdiener