Closed KarelPeeters closed 3 years ago
Hi Karel, I'm sorry to say that it looks like your Nvidia graphics card does not support SVM
which is a key OpenCL 2.0 feature.
Nvidia's support for OpenCL has always been poor, as this 5 year old article shows: https://forums.developer.nvidia.com/t/opencl-2-x-support-plans/44215. Many people believe that it is a deliberate strategy by Nvidia to encourage developers to use their (proprietary) CUDA platform.
The OpenCL 3.0 standard allows device manufacturers like Nvidia to claim support for OpenCL 3.0, without providing OpenCL 2.0 features like SVM
. Maybe Khronos having an Nvidia employee as it's president has something to do with it. AMD and Intel have much better support for OpenCL, as the example working on your Intel integrated graphics device shows.
In short, it is highly unlikely that you will be able to use SVM
on your Nvidia graphics card anytime soon. Therefore, I recommend that you follow the test_opencl_1_2_example
in the integration_test.rs file, which uses OpenCL 1.2 buffers instead of SVM
to work on OpenCL 1.2 hardware like Nvidia.
Ah that's unfortunate, and indeed test_opencl_1_2_example
does seem to work, I'll try to continue with the older OpenCL features then. I'm switching from CUDA to OpenCL specifically to avoid the lock-in, so at least I'm doing my part 🙂 .
It's strange that this check inside SvmVec
passes, even though actually using it doesn't work. Is that still a bug in this crate or should I check this stuff myself before trying to instantiate an SvmVec
?
https://github.com/kenba/opencl3/blob/71c8b8594b49f2bf43d88b1f1b4fe2552bd52273/src/svm.rs#L45-L49
I'm sorry Karel, but my previous answer was incorrect: your Nvidia graphics card does support SVM.
It supports CL_DEVICE_SVM_COARSE_GRAIN_BUFFER
, see CL_DEVICE_SVM_CAPABILITIES.
Therefore, you can still use an SvmVec
, however you must also use clEnqueueMapBuffer
and clEnqueueUnmapMemObject
as described in the OpenCL 3.0 API above.
The example in the readme
is for CL_DEVICE_SVM_FINE_GRAIN_BUFFER
(as supported by your Intel graphics device) which does not require calls to clEnqueueMapBuffer
and clEnqueueUnmapMemObject
.
Note: you can call clEnqueueMapBuffer
and clEnqueueUnmapMemObject
via the CommandQueue
methods: enqueue_map_buffer
and enqueue_unmap_mem_object
respectively.
Finally, the check inside SvmVec
is correct. However, you should always "check this stuff yourself before trying to instantiate an SvmVec
", because otherwise the assert will panic and crash your program! ;)
Are you sure it's possible to use enqueue_map_buffer
for this? It takes a &Buffer<T>
, which I don't have. I tried it by manually calling cl3::command_queue::enqueue_map_buffer
, which looks like this:
// Copy input data into an OpenCL SVM vector
let mut test_values = SvmVec::<cl_int>::with_capacity(&context, svm_capability, ARRAY_SIZE);
let mut buffer_ptr = std::ptr::null_mut();
let status = cl3::command_queue::enqueue_map_buffer(
queue.get(),
test_values.as_ptr() as cl_mem,
CL_TRUE,
CL_MAP_WRITE,
0,
test_values.len() * std::mem::size_of::<cl_int>(),
&mut buffer_ptr,
0,
std::ptr::null(),
)?;
println!("{:?}", buffer_ptr);
println!("{:?}", status);
for &val in value_array.iter() {
test_values.push(val);
}
//TODO unmap again
But that yields ClError(-34)
, ie. CL_INVALID_CONTEXT
, I assume because the Svm
is not actually a buffer. It's not super important since I can just use actual Buffer
s like in the other example.
Sorry Karel but I'm not sure that it's possible to use enqueue_map_buffer
.
I took that from the OpenCL 3.0 API, which goes into more detail about it here: SVM sharing granularity.
Unfortunately, I've not used coarse grained SVM and I think that you are correct that clEnqueueMapBuffer
does not work because SvmVec
is not actually an SVM buffer. You may have to use the value returned from clSVMAlloc
directly, instead of in the wrapper provided by SvmVec
.
The main benefits of SVM come with fine grained sharing where you don't need to call the map
and unmap
functions.
I'll just continue with Buffers for now, it's not that big of a deal.
Is there any situation it which it makes sense to instantiate an SvmVec
on a system that does not support fine-grained SVM? Maybe the CL_DEVICE_SVM_COARSE_GRAIN_BUFFER |
part in the assert I linked above can just be removed?
@KarelPeeters yes I'm sure that there are many situations where you may want to instantiate an SvmVec
on a system that does not support fine-grained SVM.
Please see #33 and test_opencl_svm_example
in integration_test.rs for an example on how to use coarse grained SVM in your system. I shall update the documentation in due course.
@KarelPeeters I've just incorporated changes to fix this issue in version 0.5 of the library on crates.io.
If you are satisfied with the changes then please close this issue.
I think there's a small issue in the readme now, ExecuteKernel::new
takes a &Kernel
and the example is missing the &
.
Otherwise everything looks great to me now, thanks a lot for all the help and for improvements!
You are correct, that "ExecuteKernel::new
takes a &Kernel
and the example is missing the &
".
However, all the integration tests are missing the &
and compile and run fine without any clippy
warnings, so it's not a problem.
Hmm that's strange I only made this issue because when running the example in the readme myself I got this error:
error[E0308]: mismatched types
--> main.rs:136:32
|
136 | ExecuteKernel::new(kernel)
| ^^^^^^
| |
| expected `&Kernel`, found struct `Kernel`
| help: consider borrowing here: `&kernel`
And it looks like the integration tests do use &kernel
, see eg.
Sorry Karel, I was wrong. All the Kernel
references in the opencl2_kernel_test.rs
integration test are missing the &
because they are references. You are quite right, all the Kernel
references in the integration_test.rs
file have the &
.
PR #36 will fix this issue as well as making the example a working program.
Great, then I think this issue is done.
I'm getting started with OpenCL, and I'm trying out this crate. I started with the example in the readme. I had to add some extra code at the top to pick a device and define
svm_capability
, I hope I did that correctly. I also print a few bits of debug info about the device, in the end this is the full code that I run:https://gist.github.com/KarelPeeters/74a864d42af4b1be5f8d8b4a5aed675d
The output of this code is:
When I change the platform to Intel, I get this:
Clearly there are a bunch of differences between both platforms, and unfortunately the result is wrong on my actual GPU. Are there other things I need to change in the example to get it to work correctly? I tried changing the kernel to just fill the output with
1
s, but the output was still all zeros, so I assume there is some SVM issue.As I said I'm pretty new to OpenCL, so I really have no idea where to start. Feel free to ask for extra information if I forgot anything!