Is it possible to use a stored dataset on GPU again and again with throwing extra data to GPU, and even change the value of the established dataset?

Hello developers! First, I'm new to GPU coding, and I found this amazing work. Thanks for your work!

Update: The following site basically describe what my question is [] https://stackoverflow.com/questions/20027403/can-different-calls-to-kernel-share-memory?rq=3

What I'm doing right now is making processed images, and the data of images is stored as array. Let's say I copy an image A to the GPU memory, and make processing on GPU. After the work, the processed image Z is stored as array in GPU. However, there may be some data of Z that is blank and needs extra data to compensate. Maybe I can copy a check value back to CPU to tell that Z needs extra data? So, I can have a second image B to be copy to GPU and then use it to compensate the blank value of Z. Finally, when Z is complete, I can copy it back to CPU, and store it in my disk.

I know that I can copy unfinised Z back to CPU, and copy it to GPU with image B again to complete it but it cost a lot of time to tansfer back and forth. Besides, Z may need image C, D or even more to compensate.

So I worder if I can only copy extra data that I need to GPU and finish Z? When Z is finish, I can copy Z back to CPU. Besides, can I dynamically create Z in GPU, and copy this Z back to CPU? I also wonder if I can store image A in GPU and use it to process with a lot of image by copying them to GPU or back to CPU individually instead of copying everything all at a time?

I've tried SharedMemory in sample code, but I can't figure out what it's doing, and how to use it exactly.

Here's the code what I tested without SharedMemory, I copy an array with value 1 to 100000000 to GPU, and I mutiply it with an integer I copy to GPU. Later, I copy another value to GPU to mutiply the array. However, the finished array copy back to CPU turns out only do the what's in the last loop.

public void test()
{
    using var context = Context.Create(builder => builder.Default().EnableAlgorithms());
    var device = context.Devices[2];
    using var accelerator = device.CreateAccelerator(context);

    int[] array = Enumerable.Range(1, 100000000).ToArray();
    int[] mult = new int[1];

    //var config2 = SharedMemoryConfig.RequestDynamic<int>(<  >);
    MemoryBuffer1D<int, Stride1D.Dense> deviceData = accelerator.Allocate1D(array);

    for(int i = 2; i < 10; i += 2)
    {
        mult[0] = i;
        MemoryBuffer1D<int, Stride1D.Dense> mul = accelerator.Allocate1D(mult) ;

        Action<Index1D, ArrayView<int>, ArrayView<int>> loadedKernel =
                accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, ArrayView<int>>(myKernel2);
        loadedKernel((int)deviceData.Length, deviceData.View, mul.View);

        accelerator.Synchronize();
        mul.Dispose();
    }

    int[] output = deviceData.GetAsArray1D();
    deviceData.Dispose();
}

static void myKernel2(Index1D i, ArrayView<int> data, ArrayView<int> mul)
{
    data[i] = i * mul[0];
}

m4rs-mt / ILGPU

Is it possible to use a stored dataset on GPU again and again with throwing extra data to GPU, and even change the value of the established dataset? #1127