Open certik opened 10 months ago
What is the priority for this issue?
I added this issue here: https://github.com/lcompilers/lpython/issues/2258, but the relative priority is not clear yet. All the issues there are important.
This seems interesting. I would like to work on this along with other things in my bucket list.
Ok, here is an example of a vectorized Mandelbrot that we should try to compile using LPython and get maximum performance.
import numpy as np
MAX_ITERS = 100
# c: Annotated[c64[:], SIMD]
def mandelbrot_kernel2(c):
z = np.empty(c.shape, dtype=np.complex128)
z[:] = c[:]
nv = np.zeros(c.shape, dtype=np.int8)
# True if the point is in set, False otherwise
mask = np.empty(c.shape, dtype=np.bool_)
for i in range(MAX_ITERS):
mask[:] = (abs(z) <= 2)
if (all(mask == False)): break
z[mask] *= z[mask]
z[mask] += c[mask]
nv[mask] += 1
return nv
n = 8
height = 4096 // n
width = 4096 // n
min_x = -2.0
max_x = 0.47
min_y = -1.12
max_y = 1.12
scale_x = (max_x - min_x) / width
scale_y = (max_y - min_y) / height
simd_width = 512
assert simd_width <= width
output = np.empty((height,width), dtype=np.int8)
x = np.empty((simd_width), dtype=np.complex128)
for h in range(height):
cy = min_y + h * scale_y
for w0 in range(width // simd_width):
w = np.arange(w0*simd_width, (w0+1)*simd_width, dtype=np.int32)
cx = min_x + w * scale_x
x[:] = cx + 1j*cy
output[h,w] = mandelbrot_kernel2(x)
print(output)
We'll use Annotated:
In ASR we use
SIMDArray
physical type, and then in the LLVM backend (or ASR->ASR pass) we ensure all such arrays get vectorized, otherwise we give a compile time error message. The conditions are: