Fix alignment of cdouble_t and cfloat_t

inducer / pyopencl

OpenCL integration for Python, plus shiny features

http://mathema.tician.de/software/pyopencl

Other

1.04k stars 239 forks source link

Fix alignment of cdouble_t and cfloat_t #654

Closed isuruf closed 1 year ago

isuruf commented 1 year ago

Can see a 10% speed boost in M2L kernel in sumpy.

alignment is extended when PYOPENCL_COMPLEX_ENABLE_EXTENDED_ALIGNMENT is defined

isuruf commented 1 year ago

@inducer, fixing the alignment only gives the performance benefit.

isuruf commented 1 year ago

It looks like the alignment of np.complex128 is 8 instead of 16. Is that going to be an issue?

inducer commented 1 year ago

It looks like the alignment of np.complex128 is 8 instead of 16. Is that going to be an issue?

Hmm, good question. Technically yes, in that it's quite easy to produce a misaligned array using the high-level Array interface: make an array of doubles a, a[1:].view(dtype=np.complex128).

Instead of promising alignment on the level of the type, perhaps a better approach is to do it per-pointer in loopy.

isuruf commented 1 year ago

Technically yes, in that it's quite easy to produce a misaligned array using the high-level Array interface: make an array of doubles a, a[1:].view(dtype=np.complex128).

Wouldn't that be possible in other cases as well? For example, an array of shorts and a view of it in int?

inducer commented 1 year ago

Wouldn't that be possible in other cases as well? For example, an array of shorts and a view of it in int?

That's a fair point. It's clear that this affects numpy as well. What does it do under those circumstances?

isuruf commented 1 year ago

Trying this script

import numpy as np
x = np.array([1, 2, 3, 4], dtype=np.int8)
print(x[0:].view(dtype=np.int16))
print(x[1:].view(dtype=np.int16))

gives me

[ 513 1027]
Traceback (most recent call last):
  File "/home/idf2/projects/test-view.py", line 4, in <module>
    print(x[1:].view(dtype=np.int16))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

isuruf commented 1 year ago

Nvm, that's because the sizes don't match

isuruf commented 1 year ago

Numpy has two code paths for when an array is aligned or not for algorithms. See https://numpy.org/devdocs/dev/alignment.html

I guess opencl buffers are aligned to the maximum data type which is long16 or int16 depending on the device.

inducer commented 1 year ago

I guess opencl buffers are aligned to the maximum data type which is long16 or int16 depending on the device.

They are, but we also support offset (specified in bytes, without alignment constraints). Also arrays backed by SVM have entirely user-specified alignment.

I think doing this per-pointer in loopy is still the better option.

isuruf commented 1 year ago

I'll send a PR to loopy

inducer commented 1 year ago

Closing as discussed.

isuruf commented 1 year ago

align_value attribute works only with pocl, so I feel like this is the best way forward. Loopy can define PYOPENCL_COMPLEX_ENABLE_EXTENDED_ALIGNMENT when generating pyopencl code.

inducer commented 1 year ago

Thanks!

Do you need a release of this right away or can this wait?

isuruf commented 1 year ago

Thanks. A new release can wait. Not in a hurry