Closed yuryGotham closed 9 months ago
hi @yuryGotham.
ILGPU does not keep track of the memory range that is being locked - it is the responsibility of the calling application to manage this, and ensure that the same memory range is not locked twice.
Please note that ILGPU is using the cudaHostRegisterPortable
flag when making the call to cudaHostRegister
. This means that the device pointer "returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation". Therefore, it is not necessary to call page lock on each accelerator.
Hi @MoFtZ,
It looks like I'm able to get the pinned memory to work for the most part. One thing that I'm still unable to figure out is how to do a partial copy between MemoryBuffer and PageLockScope.
In my scenario PageLockScopes are created ahead of time and used many times (creating a scope is slow). The amount of memory that I need to copy from PageLockScope to MemoryBuffer varies during runtime. I can dynamically point to a region of MemoryBuffer by using View.SubView(), but I cannot point to a region of PageLockScope and creating a new PageLockScope each time is VERY slow.
Is it possible to add the functionality to point to a part of PageLockScope? Even being able to point to first X bytes of PageLockScope would solve my specific problem.
@yuryGotham @m4rs-mt I have raised a PR to allow copying to/from a subset of a page locked array. Instead of adding lots of new methods, I have simplified the API to use the existing CopyFrom and CopyTo ArrayView methods.
The PageLockedScope
class now has an ArrayView
property to allow it to be used with the existing copy methods.
@yuryGotham now that #697 has been postponed, you can use this workaround in ILGPU v1.0.
using var memoryBuffer = CPUMemoryBuffer.Create(
accelerators[accId],
scopes[accId].AddrOfLockedObject,
scopes[accId].Length,
Interop.SizeOf<float>());
var arrayView = memoryBuffer.AsArrayView<float>(0L, memoryBuffer.Length);
This creates a memory buffer over the page locked array. And from the memory buffer, we get an array view. This array view can then be sliced and used in copy operations.
@MoFtZ how do I copy the arrayView to gpu using locked/async method? I'm currently calling gpuBuffer.View.CopyToPageLockedAsync(stream, pageLockedScope). CopyToPageLockedAsync does not accept ArrayView as an argument - is there a way to cast ArrayView to PageLockedArray?
@yuryGotham once the CPU buffer is page-locked, you can just copy to/from the buffer as normal.
Here is a sample program:
static void Main()
{
using var context = Context.CreateDefault();
using var accelerator = context.CreateCudaAccelerator(0);
// Create a pinned CPU buffer.
int size = 1024;
var cpuShared = new float[size];
var gcHandle = GCHandle.Alloc(cpuShared, GCHandleType.Pinned);
try
{
// Page lock the pinned CPU buffer.
using var pageLockScope = accelerator.CreatePageLockFromPinned(cpuShared);
// Wrap CPU buffer in an array view.
using var cpuMemoryBuffer = CPUMemoryBuffer.Create(
accelerator,
pageLockScope.AddrOfLockedObject,
pageLockScope.Length,
Interop.SizeOf<float>());
var cpuArrayView = cpuMemoryBuffer.AsArrayView<float>(0, cpuMemoryBuffer.Length);
// Allocate a GPU buffer.
using var gpuBuffer = accelerator.Allocate1D<float>(256);
// Copy a subset of the pinned/page-locked CPU view to the GPU.
var sourceView = cpuArrayView.SubView(32, 256);
var destinationView = gpuBuffer.View;
// NOTE: CopyFrom does not have an implicit call to stream.Synchronize.
destinationView.CopyFrom(sourceView);
accelerator.Synchronize();
}
finally
{
gcHandle.Free();
}
}
Not sure if it is a bug or a missing feature, but it appears to be impossible for multiple GPUs to use the same pinned memory on host.
Code:
Output:
I suspect that CreatePageLockFromPinned() is trying call cudaHostRegister a second time when it does not need to - it only needs to call cudaHostGetDevicePointer for subsequent accelerators.