Closed Randrianasulu closed 11 months ago
Hello, I can tell you that it can be done because I did it, although not on the open source branch. My implementation runs on CUDA and actually is tested on RTX A2000 and RTX 1080 and RTX 3080. If you want to try to do your own, you could take the threaded pug-in as a basis. It is important to note that you have to upload the bitmap to the GPU to get some decent throughput, otherwise all bandwidth is spend on memory copy. Tetrahedral interpolation code can run almost with no change on GPU, and this is what you need. Don't waste time on matrix. Good luck!
So, rust-based Mesa opencl state tracker started to work on my GeForce 710, and thus I am looking for various pieces of software to test it.
Right now lcms2 does not support opencl (?) but I wonder how much it may take, for example using plugin infrastructure?
Can lcms2 reuse some existing opencl matrix multiplication libraries?
Considering opencl devices can have relatively small limits on maximum buffer/image dimension relative to new 100MP images - does this mean such support must tile input in generic case?
according to clinfo even my puny card can do up to 200 Gflops of fp calculations, with max transfer rate at about 3.5 gb/s. But just 2gb of vram...