Broadcasting rules distinguish between host scalar and device scalars

kaushikcfd commented 3 years ago

Consider this script:

actx = _acf()

@with_container_arithmetic(
        bcast_obj_array=True,
        bcast_numpy_array=True,
        rel_comparison=True,
        _cls_has_array_context_attr=True)
@dataclass_array_container
@dataclass(frozen=True)
class Foo:
    u: DOFArray

foo = Foo(DOFArray(actx, (actx.zeros(3, dtype=np.float64) + 41, )))

print(foo)                                 # prints: Foo(u=DOFArray((cl.Array([41., 41., 41.]),)))
print(foo + 1)                             # prints: Foo(u=DOFArray((cl.Array([42., 42., 42.]),)))
print(foo + actx.from_numpy(np.ones(())))  # ERROR! 😲

I don't see a good reason why we don't support this.

I think we should also go a step ahead and make sure that we broadcast every cl.Array to be added (if legal) with all the leaf arrays of the array container. One way I could see this working is if every ArrayContext comes with a ARRAY_TYPE static, frozen attribute which gives us a hint to broadcast to all successive arrays.

inducer commented 3 years ago

I agree that this is inconsistent, however there's an elephant of a practicality in the room: How do you propose we recognize scalars?

>>> import pyopencl as cl
>>> ctx = cl.create_some_context()
>>> queue = cl.CommandQueue(ctx)
>>> import pyopencl.array as clas
>>> import pyopencl.array as cla
>>> import numpy as np

>>> one = cla.to_device(queue, np.ones(()))
>>> np.isscalar(one)
False
>>> from numbers import Number
>>> isinstance(one, Number)
False

A shot-in-the-dark .shape == ()?

kaushikcfd commented 3 years ago

How do you propose we recognize scalars?

I think every ArrayContext providing a type for a leaf array's type isn't too bad. And we can accordingly broadcast the operation with all the other leaf arrays in the array container.

I.e. I believe even the below operation could also work with the approach.

foo = Foo(DOFArray(actx, (actx.zeros(3, dtype=np.float64) + 41, )))
foo + cla.zeros(cq, (3,))

inducer commented 3 years ago

Arithmetic is a pretty important code path, eager+strong-scaling-wise. (Hence all that generated code.) I'm hesitant about doing something expensive there.

Finding the array context to go with a container (so that we can ask it what its array class is) seems kind of difficult. There currently isn't a great way to do so, and what little way there is would require recursion, i.e. it's quite expensive.

We could also do a global registry of array classes... :shrug: not that I'm a fan of that solution either.

kaushikcfd commented 3 years ago

How about we take this code path iff getting the array context is cheap like here: https://github.com/inducer/arraycontext/blob/c0062855e1cdf07ebe1ed37d5bb14fb48af4d0f7/test/test_arraycontext.py#L506-L508

inducer commented 3 years ago

That wouldn't fix your example though (unless you add an .array_context attribute). with_arithmetic already has _cls_has_array_context_attr (which, it occurs to me, could have a smarter default):

https://github.com/inducer/arraycontext/blob/c0062855e1cdf07ebe1ed37d5bb14fb48af4d0f7/arraycontext/container/arithmetic.py#L136

I guess I'm not super opposed to this, though certainly not delighted. Want to put together a PR so we can see what it might look like?

kaushikcfd commented 3 years ago

Want to put together a PR so we can see what it might look like?

Yup. Thanks for considering this.

alexfikl commented 3 years ago

+1 for something like this on my side too. I was just playing with getting pytential working with force_device_scalars=True and it died pretty quicky because of expressions like

sym.Ones() * sym.mean(ambient_dim, dim, expr)

as it didn't know how to multiply a DOFArray (from Ones) with a device scalar (from mean).

kaushikcfd commented 3 years ago

There's actually another issue hiding there:

>>> a = cla.to_device(cq, np.ones(3))
>>> b = cla.to_device(cq, np.ones(()))
>>> a
cl.Array([1., 1., 1.])
>>> b
cl.Array(1.)
>>> a * b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/shared/home/kgk2/pack/virtual_envs/emirge/pyopencl/pyopencl/array.py", line 1161, in __mul__
    self._elwise_multiply(result, self, other))
  File "/shared/home/kgk2/pack/virtual_envs/emirge/pyopencl/pyopencl/array.py", line 183, in kernel_runner
    knl = kernel_getter(*args, **kwargs)
  File "/shared/home/kgk2/pack/virtual_envs/emirge/pyopencl/pyopencl/array.py", line 897, in _elwise_multiply
    assert out.shape == b.shape

I.e. PyOpenCL doesn't support broadcasting.

inducer commented 3 years ago

Let's take the PyOpenCL discussion here: https://github.com/inducer/pyopencl/issues/498

inducer commented 3 years ago

See #51 for work by @kaushikcfd towards solving this.

inducer / arraycontext

Broadcasting rules distinguish between host scalar and device scalars #49