tugrul512bit / Cekirdekler

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
GNU General Public License v3.0
93 stars 9 forks source link

Is there an example of generating a Unity Texture? #53

Open mfagerlund opened 5 years ago

mfagerlund commented 5 years ago

I'd like to use Cekirdekler to generate textures from kernels that I generate at runtime, is there any minimal demo showing how to do this?

cheers, m

tugrul512bit commented 5 years ago

Sorry for this late response,

If you are asking about pure OpenGL textures, I don't have enough experience with them. But if you just need some array with specific format, then I can help you. Even hello world example can be adjusted to generate an array with some format. I will add some here.

tugrul512bit commented 5 years ago

For example, below code should generate 4096 element arrays with zero on all elements by initializing them as float4 (4-element float vectors) inside kernel (using 1024 threads spanning all GPUs, each thread initializing 1 float4 for a(f), 1 float4 for b(g)).

ClPlatforms platforms = ClPlatforms.all();
var selectedDevices = platforms.devicesWithMostComputeUnits().gpus(true);
selectedDevices.logInfo();
ClNumberCruncher gpu = new ClNumberCruncher(selectedDevices, @"
     __kernel void algorithmTest(__global float4 * a, __global float4 * b)
     { 
          int i=get_global_id(0);
          a[i]=(float4)(0.0f,0.0f,0.0f,0.0f);
          b[i]=(float4)(0.0f,0.0f,0.0f,0.0f);
     }
");
ClArray<float> f = new ClArray<float>(4096);
f.numberOfElementsPerWorkItem = 4;
f.writeOnly=true;
f.zeroCopy=true;
f.write = true;
f.read = false;
f.readOnly = false;
ClArray<float> g = new ClArray<float>(4096);
g.numberOfElementsPerWorkItem = 4;
g.writeOnly=true;
g.zeroCopy=true;
g.write = true;
g.read = false;
g.readOnly = false;
gpu.performanceFeed = true;

// parameter "64" is the minimum allowed thread trade between GPUs so that the balance the workload
f.nextParam(g).compute(gpu, 1, "algorithmTest", 4096/4, 64);
// here f[index] and g[index] must have zero
tugrul512bit commented 5 years ago

Also here is how I tried producing vertices (as height-map of a sphere)

https://github.com/tugrul512bit/unityTestMeshDeformation/blob/master/gpgpu_test/Assets/Kamera.cs (kernel is from line 236 to 245)

In there, to use "structs" of Unity, such as

public Vector3[] verticesBase

you can use

xyzGPU = ClArray<byte>.wrapArrayOfStructs(verticesBase);

it wraps around whole structs and treats all items in them as floats or whatever items in kernel as you like. Using byte type is just for treating as a data transmission buffer on host side. In kernel, you can use their elements by casting to different type on proper alignment points. You shouldn't dereference a non-multiple of 4 address as a float nor int.

mfagerlund commented 5 years ago

thanks, that's awesome! My goal is to evolve proceducerally generated fragment shaders - so I'd like to bind the output to a texture and just blit it to a flat quad. Unity won't allow me to compile a fragment shaders on the fly, but this seems like it could work!

I'll get back to you if I make it work!

cheers, m

Den tors 21 mars 2019 kl 20:16 skrev Hüseyin Tuğrul BÜYÜKIŞIK < notifications@github.com>:

For example, below code should generate 4096 element arrays with zero on all elements by initializing them as float4 (4-element float vectors) inside kernel.

ClPlatforms platforms = ClPlatforms.all(); var selectedDevices = platforms.devicesWithMostComputeUnits().gpus(true); selectedDevices.logInfo(); ClNumberCruncher gpu = new ClNumberCruncher(selectedDevices, @" kernel void algorithmTest(global float4 a, __global float4 b) { int i=get_global_id(0); a[i]=(float4)(0.0f,0.0f,0.0f,0.0f); b[i]=(float4)(0.0f,0.0f,0.0f,0.0f); } "); ClArray f = new ClArray(4096); f.numberOfElementsPerWorkItem = 4; f.readOnly=true; f.zeroCopy=true; f.write = false; f.readOnly = true; ClArray g = new ClArray(4096); g.numberOfElementsPerWorkItem = 4; g.zeroCopy=true; g.read = false; gpu.performanceFeed = true; f.nextParam(g).compute(gpu, 1, "algorithmTest", 4096/4, 64);

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tugrul512bit/Cekirdekler/issues/53#issuecomment-475366902, or mute the thread https://github.com/notifications/unsubscribe-auth/AMfe-cQ1BsMXEAFsndVsUXNW2MArn5iNks5vY9pvgaJpZM4b3oLF .

-- Mattias Fagerlund Carretera AB