Closed RationalFragile closed 11 months ago
hi @RationalFragile.
You can create your own MemoryBuffer class for this purpose. ILGPU puts a few extra layers on top of MemoryBuffer, so you will need to create your own ArrayView instances, for example. But the functionality should be available.
using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.OpenCL;
using ILGPU.Util;
using System;
namespace CLHostMemory
{
class Program
{
static void MyKernel(
Index1D index,
ArrayView<int> dataView,
int constant)
{
dataView[index] = index + constant;
}
static void Main()
{
// Create main context
using var context = Context.CreateDefault();
using var accelerator = context.CreateCLAccelerator(0);
var kernel = accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, int>(MyKernel);
using var buffer = accelerator.AllocateHostMemory<int>(1024);
var bufferView = new ArrayView<int>(buffer, 0, buffer.Length);
var bufferView1D = new ArrayView1D<int, Stride1D.Dense>(bufferView, bufferView.Extent, new Stride1D.Dense());
kernel((int)buffer.Length, bufferView1D, 42);
var data = bufferView1D.GetAsArray1D();
for (int i = 0, e = data.Length; i < e; ++i)
{
if (data[i] != 42 + i)
Console.WriteLine($"Error at element location {i}: {data[i]} found");
}
}
}
class CLHostMemoryBuffer<T> : MemoryBuffer
where T : unmanaged
{
public CLHostMemoryBuffer(
CLAccelerator accelerator,
long length,
int elementSize)
: base(accelerator, length, elementSize)
{
CLException.ThrowIfFailed(
CLAPI.CurrentAPI.CreateBuffer(
accelerator.NativePtr,
CLBufferFlags.CL_MEM_ALLOC_HOST_PTR,
new IntPtr(LengthInBytes),
IntPtr.Zero,
out IntPtr resultPtr));
NativePtr = resultPtr;
}
protected override void DisposeAcceleratorObject(bool disposing)
{
if (disposing)
CLException.ThrowIfFailed(CLAPI.CurrentAPI.ReleaseBuffer(NativePtr));
NativePtr = IntPtr.Zero;
}
protected override void CopyFrom(
AcceleratorStream stream,
in ArrayView<byte> sourceView,
in ArrayView<byte> targetView) =>
CLMemoryBuffer.CLCopy(stream.AsNotNullCast<CLStream>(), sourceView, targetView);
protected override void CopyTo(
AcceleratorStream stream,
in ArrayView<byte> sourceView,
in ArrayView<byte> targetView) =>
CLMemoryBuffer.CLCopy(stream.AsNotNullCast<CLStream>(), sourceView, targetView);
protected override void MemSet(
AcceleratorStream stream,
byte value,
in ArrayView<byte> targetView) =>
CLMemoryBuffer.CLMemSet(stream.AsNotNullCast<CLStream>(), value, targetView);
}
static class CLHostMemoryExtensions
{
public static CLHostMemoryBuffer<T> AllocateHostMemory<T>(this CLAccelerator accelerator, long length)
where T : unmanaged =>
new CLHostMemoryBuffer<T>(accelerator, length, Interop.SizeOf<T>());
}
}
Some semi-related posts include #794 and #826.
Thank you very much 😊
Hi, So according to Intel, in devices where the the cpu and integrated gpu use the same physical memory, one can simply set a buffer as "cpu" or as "gpu" without needing to copy to another buffer for read and write (to improve performance).
The method is to use the flag
CL_MEM_ALLOC_HOST_PTR
when callingclCreateBuffer()
, then callclEnqueueMapBuffer()
to make it host-side, and callclEnqueueUnmapMemObject()
to make it device-side.Can you please add a way of doing this in ILGPU? Something like adding a way to set this mode (to use the correct flag) then some way to transition a MemoryBuffer to host-side and read/write from it, then transition it back to device-side. Thank you!
Here is the link for the intel post: https://www.intel.com/content/www/us/en/developer/articles/training/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics.html