Is it possible to use a stored dataset on GPU again and again with throwing extra data to GPU, and even change the value of the established dataset? #1127
What I'm doing right now is making processed images, and the data of images is stored as array.
Let's say I copy an image A to the GPU memory, and make processing on GPU. After the work, the processed image Z is stored as array in GPU. However, there may be some data of Z that is blank and needs extra data to compensate. Maybe I can copy a check value back to CPU to tell that Z needs extra data? So, I can have a second image B to be copy to GPU and then use it to compensate the blank value of Z. Finally, when Z is complete, I can copy it back to CPU, and store it in my disk.
I know that I can copy unfinised Z back to CPU, and copy it to GPU with image B again to complete it but it cost a lot of time to tansfer back and forth. Besides, Z may need image C, D or even more to compensate.
So I worder if I can only copy extra data that I need to GPU and finish Z? When Z is finish, I can copy Z back to CPU.
Besides, can I dynamically create Z in GPU, and copy this Z back to CPU?
I also wonder if I can store image A in GPU and use it to process with a lot of image by copying them to GPU or back to CPU individually instead of copying everything all at a time?
I've tried SharedMemory in sample code, but I can't figure out what it's doing, and how to use it exactly.
Here's the code what I tested without SharedMemory, I copy an array with value 1 to 100000000 to GPU, and I mutiply it with an integer I copy to GPU. Later, I copy another value to GPU to mutiply the array. However, the finished array copy back to CPU turns out only do the what's in the last loop.
public void test()
{
using var context = Context.Create(builder => builder.Default().EnableAlgorithms());
var device = context.Devices[2];
using var accelerator = device.CreateAccelerator(context);
int[] array = Enumerable.Range(1, 100000000).ToArray();
int[] mult = new int[1];
//var config2 = SharedMemoryConfig.RequestDynamic<int>(< >);
MemoryBuffer1D<int, Stride1D.Dense> deviceData = accelerator.Allocate1D(array);
for(int i = 2; i < 10; i += 2)
{
mult[0] = i;
MemoryBuffer1D<int, Stride1D.Dense> mul = accelerator.Allocate1D(mult) ;
Action<Index1D, ArrayView<int>, ArrayView<int>> loadedKernel =
accelerator.LoadAutoGroupedStreamKernel<Index1D, ArrayView<int>, ArrayView<int>>(myKernel2);
loadedKernel((int)deviceData.Length, deviceData.View, mul.View);
accelerator.Synchronize();
mul.Dispose();
}
int[] output = deviceData.GetAsArray1D();
deviceData.Dispose();
}
static void myKernel2(Index1D i, ArrayView<int> data, ArrayView<int> mul)
{
data[i] = i * mul[0];
}
Sorry for being dumb here
data[i] = i * mul[0];
should be
data[i] += i * mul[0];
then the calculation can be done in GPU, and the final correct result can be copied to CPU in the end
Hello developers! First, I'm new to GPU coding, and I found this amazing work. Thanks for your work!
Update: The following site basically describe what my question is [] https://stackoverflow.com/questions/20027403/can-different-calls-to-kernel-share-memory?rq=3
What I'm doing right now is making processed images, and the data of images is stored as array. Let's say I copy an image A to the GPU memory, and make processing on GPU. After the work, the processed image Z is stored as array in GPU. However, there may be some data of Z that is blank and needs extra data to compensate. Maybe I can copy a check value back to CPU to tell that Z needs extra data? So, I can have a second image B to be copy to GPU and then use it to compensate the blank value of Z. Finally, when Z is complete, I can copy it back to CPU, and store it in my disk.
I know that I can copy unfinised Z back to CPU, and copy it to GPU with image B again to complete it but it cost a lot of time to tansfer back and forth. Besides, Z may need image C, D or even more to compensate.
So I worder if I can only copy extra data that I need to GPU and finish Z? When Z is finish, I can copy Z back to CPU. Besides, can I dynamically create Z in GPU, and copy this Z back to CPU? I also wonder if I can store image A in GPU and use it to process with a lot of image by copying them to GPU or back to CPU individually instead of copying everything all at a time?
I've tried SharedMemory in sample code, but I can't figure out what it's doing, and how to use it exactly.
Here's the code what I tested without SharedMemory, I copy an array with value 1 to 100000000 to GPU, and I mutiply it with an integer I copy to GPU. Later, I copy another value to GPU to mutiply the array. However, the finished array copy back to CPU turns out only do the what's in the last loop.