Closed isuruf closed 1 year ago
@inducer, fixing the alignment only gives the performance benefit.
It looks like the alignment of np.complex128
is 8 instead of 16. Is that going to be an issue?
It looks like the alignment of
np.complex128
is 8 instead of 16. Is that going to be an issue?
Hmm, good question. Technically yes, in that it's quite easy to produce a misaligned array using the high-level Array
interface: make an array of doubles a
, a[1:].view(dtype=np.complex128)
.
Instead of promising alignment on the level of the type, perhaps a better approach is to do it per-pointer in loopy.
Technically yes, in that it's quite easy to produce a misaligned array using the high-level Array interface: make an array of doubles a, a[1:].view(dtype=np.complex128).
Wouldn't that be possible in other cases as well? For example, an array of shorts and a view of it in int?
Wouldn't that be possible in other cases as well? For example, an array of shorts and a view of it in int?
That's a fair point. It's clear that this affects numpy as well. What does it do under those circumstances?
Trying this script
import numpy as np
x = np.array([1, 2, 3, 4], dtype=np.int8)
print(x[0:].view(dtype=np.int16))
print(x[1:].view(dtype=np.int16))
gives me
[ 513 1027]
Traceback (most recent call last):
File "/home/idf2/projects/test-view.py", line 4, in <module>
print(x[1:].view(dtype=np.int16))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
Nvm, that's because the sizes don't match
Numpy has two code paths for when an array is aligned or not for algorithms. See https://numpy.org/devdocs/dev/alignment.html
I guess opencl buffers are aligned to the maximum data type which is long16
or int16
depending on the device.
I guess opencl buffers are aligned to the maximum data type which is
long16
orint16
depending on the device.
They are, but we also support offset
(specified in bytes, without alignment constraints). Also arrays backed by SVM have entirely user-specified alignment.
I think doing this per-pointer in loopy is still the better option.
I'll send a PR to loopy
Closing as discussed.
align_value
attribute works only with pocl, so I feel like this is the best way forward. Loopy can define PYOPENCL_COMPLEX_ENABLE_EXTENDED_ALIGNMENT
when generating pyopencl code.
Thanks!
Do you need a release of this right away or can this wait?
Thanks. A new release can wait. Not in a hurry
Can see a 10% speed boost in M2L kernel in sumpy.
alignment is extended when
PYOPENCL_COMPLEX_ENABLE_EXTENDED_ALIGNMENT
is defined