m4rs-mt / ILGPU

ILGPU JIT Compiler for high-performance .Net GPU programs
http://www.ilgpu.net
Other
1.38k stars 117 forks source link

Is it possible to use a stored dataset on GPU again and again with throwing extra data to GPU, and even change the value of the established dataset? #1127

Closed p66114129 closed 11 months ago

p66114129 commented 11 months ago

Hello developers! First, I'm new to GPU coding, and I found this amazing work. Thanks for your work!

Update: The following site basically describe what my question is [] https://stackoverflow.com/questions/20027403/can-different-calls-to-kernel-share-memory?rq=3

What I'm doing right now is making processed images, and the data of images is stored as array. Let's say I copy an image A to the GPU memory, and make processing on GPU. After the work, the processed image Z is stored as array in GPU. However, there may be some data of Z that is blank and needs extra data to compensate. Maybe I can copy a check value back to CPU to tell that Z needs extra data? So, I can have a second image B to be copy to GPU and then use it to compensate the blank value of Z. Finally, when Z is complete, I can copy it back to CPU, and store it in my disk.

I know that I can copy unfinised Z back to CPU, and copy it to GPU with image B again to complete it but it cost a lot of time to tansfer back and forth. Besides, Z may need image C, D or even more to compensate.

So I worder if I can only copy extra data that I need to GPU and finish Z? When Z is finish, I can copy Z back to CPU. Besides, can I dynamically create Z in GPU, and copy this Z back to CPU? I also wonder if I can store image A in GPU and use it to process with a lot of image by copying them to GPU or back to CPU individually instead of copying everything all at a time?

I've tried SharedMemory in sample code, but I can't figure out what it's doing, and how to use it exactly.

Here's the code what I tested without SharedMemory, I copy an array with value 1 to 100000000 to GPU, and I mutiply it with an integer I copy to GPU. Later, I copy another value to GPU to mutiply the array. However, the finished array copy back to CPU turns out only do the what's in the last loop.

public void test()
{
    using var context = Context.Create(builder => builder.Default().EnableAlgorithms());
    var device = context.Devices[2];
    using var accelerator = device.CreateAccelerator(context);

    int[] array = Enumerable.Range(1, 100000000).ToArray();
    int[] mult = new int[1];

    //var config2 = SharedMemoryConfig.RequestDynamic<int>(<  >);
    MemoryBuffer1D<int, Stride1D.Dense> deviceData = accelerator.Allocate1D(array);

    for(int i = 2; i < 10; i += 2)
    {
        mult[0] = i;
        MemoryBuffer1D<int, Stride1D.Dense> mul = accelerator.Allocate1D(mult) ;

        Action<Index1D, ArrayView<int>, ArrayView<int>> loadedKernel =
                accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, ArrayView<int>>(myKernel2);
        loadedKernel((int)deviceData.Length, deviceData.View, mul.View);

        accelerator.Synchronize();
        mul.Dispose();
    }

    int[] output = deviceData.GetAsArray1D();
    deviceData.Dispose();
}

static void myKernel2(Index1D i, ArrayView<int> data, ArrayView<int> mul)
{
    data[i] = i * mul[0];
}
p66114129 commented 11 months ago

Sorry for being dumb here data[i] = i * mul[0]; should be data[i] += i * mul[0]; then the calculation can be done in GPU, and the final correct result can be copied to CPU in the end