lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.47k stars 157 forks source link

SIMD backend #2310

Open certik opened 10 months ago

certik commented 10 months ago
diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 26e60e172..d6a29ecef 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -420,6 +420,7 @@ array_physical_type
     = DescriptorArray
     | PointerToDataArray
     | FixedSizeArray
+    | SIMDArray
     | NumPyArray
     | ISODescriptorArray

We'll use Annotated:

from typing import Annotated
from lpython import f32, SIMD
x: Annotated[f32[64], SIMD]

In ASR we use SIMDArray physical type, and then in the LLVM backend (or ASR->ASR pass) we ensure all such arrays get vectorized, otherwise we give a compile time error message. The conditions are:

Shaikh-Ubaid commented 10 months ago

What is the priority for this issue?

certik commented 10 months ago

I added this issue here: https://github.com/lcompilers/lpython/issues/2258, but the relative priority is not clear yet. All the issues there are important.

czgdp1807 commented 10 months ago

This seems interesting. I would like to work on this along with other things in my bucket list.

certik commented 10 months ago

Ok, here is an example of a vectorized Mandelbrot that we should try to compile using LPython and get maximum performance.

import numpy as np

MAX_ITERS = 100

# c: Annotated[c64[:], SIMD]
def mandelbrot_kernel2(c):
    z = np.empty(c.shape, dtype=np.complex128)
    z[:] = c[:]
    nv = np.zeros(c.shape, dtype=np.int8)
    # True if the point is in set, False otherwise
    mask = np.empty(c.shape, dtype=np.bool_)
    for i in range(MAX_ITERS):
        mask[:] = (abs(z) <= 2)
        if (all(mask == False)): break
        z[mask] *= z[mask]
        z[mask] += c[mask]
        nv[mask] += 1
    return nv

n = 8
height = 4096 // n
width = 4096 // n
min_x = -2.0
max_x = 0.47
min_y = -1.12
max_y = 1.12
scale_x = (max_x - min_x) / width
scale_y = (max_y - min_y) / height
simd_width = 512
assert simd_width <= width

output = np.empty((height,width), dtype=np.int8)

x = np.empty((simd_width), dtype=np.complex128)
for h in range(height):
    cy = min_y + h * scale_y
    for w0 in range(width // simd_width):
        w = np.arange(w0*simd_width, (w0+1)*simd_width, dtype=np.int32)
        cx = min_x + w * scale_x
        x[:] = cx + 1j*cy
        output[h,w] = mandelbrot_kernel2(x)

print(output)