inducer / pyopencl

OpenCL integration for Python, plus shiny features
http://mathema.tician.de/software/pyopencl
Other
1.05k stars 240 forks source link

Problem with GenericScanKernel #547

Closed yves-surrel closed 2 years ago

yves-surrel commented 2 years ago

After remarking some 'noise' in the output of GenericScanKernel in my application, I went to test this very simple case, where the input expression is simply "i":

import numpy as np
from matplotlib.pyplot import plot, title
import pyopencl as cl
import pyopencl.array as cla
from pyopencl.scan import GenericScanKernel

platform, = cl.get_platforms()

d0, d1, d2 = platform.get_devices()

ctx = cl.Context([d2,])

queue = cl.CommandQueue(ctx)

out_buf = cla.empty(queue, (65536,), dtype=np.float32)

# Kernel to make running sum
knl = GenericScanKernel(
    queue.context, np.float32,
    arguments=" __global float *outBuf",
    input_expr="i",
    scan_expr="a+b", neutral="0",
    output_statement="outBuf[i] = item")

knl(out_buf, queue=queue)

out_buf = out_buf.get()

x = np.arange(out_buf.size)

# Plot deviation from theoretical result
plot(out_buf - x * (x+1) / 2)

title(ctx.devices[0].name)

The output sum should be the parabola S(i) = i(i + 1)/2. However, for large buffer sizes, the result is wrong., as shown below for the three OpenCL devices in my MBP. Figure_0 Figure_1 Figure_2

yves-surrel commented 2 years ago

Awfully sorry for this stupid statement. A single precision float obviously cannot hold a sum up to 2.14751642e9 with a unity résolution.