620 (11100df4f3d8a01b85d1667a3cc344176a98f814) causes operations between CL arrays and unit-sized numpy arrays to fall back to numpy to perform the loop. For example:

import numpy as np
import pyopencl as cl
import pyopencl.array as cla

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
ary = cla.zeros(queue, (10**5,), "float64")

ary += 1.  # first run

from time import time

s = time()
ary += 1.
e = time()
print(e-s)

s = time()
ary += np.array([1.])
e = time()
print(e-s)
print(type(ary), type(ary[0]))

yields, e.g.,

5.054473876953125e-05
7.800684928894043
<class 'numpy.ndarray'> <class 'pyopencl.array.Array'>

Namely, each element of the CL array is added with the unit-sized numpy array one by one, which of course is superbly slow. The result turns ary into a numpy array of CL scalar arrays!

Before, unit-sized arrays were simply freely passed along to kernels. (I can't see precisely why that used to work, but it did!) I wouldn't think it's desirable to ever fall back on numpy to perform the loops - I would think incompatible operations should fail ungracefully, i.e., for array operations that would be valid if both operands were numpy arrays or CL arrays but crash at kernel invocation when trying to pass a numpy array pointer.

inducer / pyopencl

New np.isscalar checks in array arithmetic break operations with unit-length arrays #663

620 (11100df4f3d8a01b85d1667a3cc344176a98f814) causes operations between CL arrays and unit-sized numpy arrays to fall back to numpy to perform the loop. For example: