Add a CPU-GPU-Shared MemoryBuffer for systems that support it

RationalFragile commented 11 months ago

Hi, So according to Intel, in devices where the the cpu and integrated gpu use the same physical memory, one can simply set a buffer as "cpu" or as "gpu" without needing to copy to another buffer for read and write (to improve performance).

The method is to use the flag CL_MEM_ALLOC_HOST_PTR when calling clCreateBuffer(), then call clEnqueueMapBuffer() to make it host-side, and call clEnqueueUnmapMemObject() to make it device-side.

Can you please add a way of doing this in ILGPU? Something like adding a way to set this mode (to use the correct flag) then some way to transition a MemoryBuffer to host-side and read/write from it, then transition it back to device-side. Thank you!

Here is the link for the intel post: https://www.intel.com/content/www/us/en/developer/articles/training/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics.html

MoFtZ commented 11 months ago

hi @RationalFragile.

You can create your own MemoryBuffer class for this purpose. ILGPU puts a few extra layers on top of MemoryBuffer, so you will need to create your own ArrayView instances, for example. But the functionality should be available.

using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.OpenCL;
using ILGPU.Util;
using System;

namespace CLHostMemory
{
    class Program
    {
        static void MyKernel(
            Index1D index,
            ArrayView<int> dataView,
            int constant)
        {
            dataView[index] = index + constant;
        }

        static void Main()
        {
            // Create main context
            using var context = Context.CreateDefault();
            using var accelerator = context.CreateCLAccelerator(0);

            var kernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, int>(MyKernel);

            using var buffer = accelerator.AllocateHostMemory<int>(1024);
            var bufferView = new ArrayView<int>(buffer, 0, buffer.Length);
            var bufferView1D = new ArrayView1D<int, Stride1D.Dense>(bufferView, bufferView.Extent, new Stride1D.Dense());

            kernel((int)buffer.Length, bufferView1D, 42);

            var data = bufferView1D.GetAsArray1D();
            for (int i = 0, e = data.Length; i < e; ++i)
            {
                if (data[i] != 42 + i)
                    Console.WriteLine($"Error at element location {i}: {data[i]} found");
            }
        }
    }

    class CLHostMemoryBuffer<T> : MemoryBuffer
        where T : unmanaged
    {
        public CLHostMemoryBuffer(
            CLAccelerator accelerator,
            long length,
            int elementSize)
            : base(accelerator, length, elementSize)
        {
            CLException.ThrowIfFailed(
                CLAPI.CurrentAPI.CreateBuffer(
                    accelerator.NativePtr,
                    CLBufferFlags.CL_MEM_ALLOC_HOST_PTR,
                    new IntPtr(LengthInBytes),
                    IntPtr.Zero,
                    out IntPtr resultPtr));
            NativePtr = resultPtr;
        }

        protected override void DisposeAcceleratorObject(bool disposing)
        {
            if (disposing)
                CLException.ThrowIfFailed(CLAPI.CurrentAPI.ReleaseBuffer(NativePtr));
            NativePtr = IntPtr.Zero;
        }

        protected override void CopyFrom(
            AcceleratorStream stream,
            in ArrayView<byte> sourceView,
            in ArrayView<byte> targetView) =>
            CLMemoryBuffer.CLCopy(stream.AsNotNullCast<CLStream>(), sourceView, targetView);

        protected override void CopyTo(
            AcceleratorStream stream,
            in ArrayView<byte> sourceView,
            in ArrayView<byte> targetView) =>
            CLMemoryBuffer.CLCopy(stream.AsNotNullCast<CLStream>(), sourceView, targetView);

        protected override void MemSet(
            AcceleratorStream stream,
            byte value,
            in ArrayView<byte> targetView) =>
            CLMemoryBuffer.CLMemSet(stream.AsNotNullCast<CLStream>(), value, targetView);
    }

    static class CLHostMemoryExtensions
    {
        public static CLHostMemoryBuffer<T> AllocateHostMemory<T>(this CLAccelerator accelerator, long length)
            where T : unmanaged =>
            new CLHostMemoryBuffer<T>(accelerator, length, Interop.SizeOf<T>());
    }
}

Some semi-related posts include #794 and #826.

RationalFragile commented 11 months ago

Thank you very much 😊

m4rs-mt / ILGPU

Add a CPU-GPU-Shared MemoryBuffer for systems that support it #1130