gpu / JOCL

Java bindings for OpenCL
http://www.jocl.org
Other
183 stars 33 forks source link

How to use jocl on windows #47

Closed canhongluori closed 1 year ago

canhongluori commented 1 year ago

I have install NVIDIA OpenCL sdk and have dll file, but I can not use it on IDEA

gpu commented 1 year ago

The easiest way to use JOCL schould be to set up a Maven project, and add a dependency to JOCL as

<dependency>
    <groupId>org.jocl</groupId>
    <artifactId>jocl</artifactId>
    <version>2.0.4</version>
</dependency>

If this does not help, maybe you can provide further information about your setup, how to try to use JOCL, or what does not work exactly.

canhongluori commented 1 year ago

I have done this, but when I try to run the test HistogramNVIDIA the test failed

gpu commented 1 year ago

What exactly is the error message? Or does it just crash? Or does it just say "TEST FAILED"? (It's difficult. Try to imagine you had to respond to such an issue ...)

canhongluori commented 1 year ago

Sorry sir, I forgot to put my test message. Here is my test message Starting...

Initializing data... Initializing OpenCL... Allocating OpenCL memory...

Initializing 256-bin OpenCL histogram... ...loading Histogram256.cl ...creating histogram256 program ...building histogram256 program ...creating histogram256 kernels ...allocating internal histogram256 buffer Running 256-bin OpenCL histogram for 1048576 bytes... Validating OpenCL results... ...reading back OpenCL results ...histogram256CPU() 256-bin histograms do not match!!!

Shutting down 256-bin OpenCL histogram...

TEST FAILED !!! Shutting down...

canhongluori commented 1 year ago

Besides, I when I try to run JOCLSample_2_0_SVM, I also failed. Here is my test message Using device NVIDIA GeForce RTX 3060 Laptop GPU, version 3.0 At 0 got 0.0, setting 0.0 At 1 got 1.0, setting 2.0 At 2 got 4.0, setting 4.0 At 3 got 9.0, setting 6.0 At 4 got 16.0, setting 8.0 At 5 got 25.0, setting 10.0 At 6 got 36.0, setting 12.0 At 7 got 49.0, setting 14.0 At 8 got 64.0, setting 16.0 At 9 got 81.0, setting 18.0 Exception in thread "main" org.jocl.CLException: CL_INVALID_KERNEL_ARGS at org.jocl.CL.checkResult(CL.java:825) at org.jocl.CL.clEnqueueNDRangeKernel(CL.java:20954) at org.jocl.samples.JOCLSample_2_0_SVM.main(JOCLSample_2_0_SVM.java:109)

gpu commented 1 year ago

The JOCLSample_2_0_SVM does not work for me either. It did work once, so I'll have to re-read the specification, and maybe see whether something in OpenCL changed that might explain this. Maybe SVM is not or no longer supported by NVIDIA...?

Regarding the HistogramNVIDIA sample: There doesn't seem to be any crash or so. So I could only make guesses until now.

Do other samples work in general? (I'd just like to know whether there's something fundamentally wrong, or whether NVIDIA just messed up something in that example...)

What happens when you add the output

      System.out.println("CPU: " + Arrays.toString(h_HistogramCPU));
      System.out.println("GPU: " + Arrays.toString(h_HistogramGPU));

before the line https://github.com/gpu/JOCLSamples/blob/40d5f8c2cc49017d7f5b13b882a7386fe4ddee2a/src/main/java/org/jocl/samples/HistogramNVIDIA.java#L117 ?

canhongluori commented 1 year ago

I have added the these code behind “System.out.println(PassFailFlag != 0 ? "256-bin histograms match\n" : "256-bin histograms do not match!!!\n" );” the results are as follows: Starting...

Initializing data... Initializing OpenCL... Allocating OpenCL memory...

Initializing 256-bin OpenCL histogram... ...loading Histogram256.cl ...creating histogram256 program ...building histogram256 program ...creating histogram256 kernels ...allocating internal histogram256 buffer Running 256-bin OpenCL histogram for 1048576 bytes... Validating OpenCL results... ...reading back OpenCL results ...histogram256CPU() 256-bin histograms do not match!!!

CPU: [4147, 4058, 4022, 4268, 4098, 4113, 4113, 4075, 4072, 4178, 4008, 4155, 4014, 4015, 4088, 4179, 4055, 4074, 4136, 4071, 4007, 4053, 4110, 4077, 4147, 4046, 4193, 4163, 4026, 4073, 4075, 4129, 4117, 4035, 4063, 4080, 4053, 4173, 4106, 4122, 4116, 4094, 4121, 4037, 4242, 4266, 4103, 4056, 4100, 4132, 4142, 4109, 4161, 4156, 4057, 4103, 4113, 4094, 4058, 4006, 4004, 4220, 4030, 4025, 4035, 4062, 4111, 4117, 4061, 4132, 4032, 4030, 4014, 4114, 4021, 3974, 4136, 4084, 4140, 4083, 4094, 4030, 4015, 4071, 4255, 4074, 4106, 4003, 4112, 3968, 4117, 4034, 4114, 4032, 4165, 4108, 4088, 4151, 4099, 4116, 4194, 3936, 4157, 4133, 4173, 4135, 4167, 4156, 4017, 4118, 3925, 4152, 4006, 4108, 4129, 4041, 4202, 4204, 4107, 4061, 4142, 4078, 4061, 4112, 4192, 4142, 4137, 4121, 4111, 4223, 4135, 4043, 3968, 4178, 4089, 4089, 4040, 4121, 4125, 4077, 4146, 4074, 4109, 4169, 4011, 4035, 4064, 4150, 4101, 4113, 4073, 4106, 4050, 4044, 4041, 3970, 4148, 4063, 4060, 4071, 4132, 4160, 4100, 4065, 4020, 3908, 4137, 4069, 4181, 4142, 4120, 4109, 4051, 4123, 4140, 4082, 4078, 3982, 3997, 4061, 4058, 4076, 4074, 4143, 4112, 4116, 4068, 4128, 4080, 4073, 4075, 3987, 4142, 4195, 4271, 4060, 3969, 4137, 4168, 4087, 4112, 4047, 4159, 4099, 4041, 4040, 4175, 4100, 4125, 4106, 4063, 3990, 4114, 4100, 4140, 4174, 4114, 4078, 4131, 4157, 4115, 4061, 4110, 4062, 4162, 4072, 4050, 4138, 4117, 4128, 4056, 4158, 4030, 4209, 4116, 4190, 4052, 4149, 4052, 4151, 4133, 4113, 4139, 4156, 4158, 4055, 4011, 4106, 4008, 4072, 4030, 4169, 4158, 4003, 4152, 4061] GPU: [4088, 4006, 3979, 4223, 4047, 4076, 4059, 4043, 4020, 4138, 3973, 4108, 3976, 3977, 4043, 4143, 4011, 4032, 4082, 4023, 3964, 4014, 4068, 4026, 4103, 4002, 4150, 4125, 3997, 4033, 4035, 4077, 4066, 3993, 4018, 4031, 4006, 4130, 4074, 4085, 4073, 4051, 4076, 4003, 4191, 4209, 4059, 4017, 4049, 4093, 4088, 4065, 4112, 4110, 4028, 4059, 4065, 4051, 4012, 3968, 3969, 4182, 3983, 3978, 3984, 4006, 4065, 4071, 4024, 4083, 3994, 3979, 3972, 4069, 3990, 3929, 4095, 4048, 4096, 4046, 4041, 3991, 3978, 4021, 4210, 4025, 4064, 3956, 4071, 3928, 4077, 3993, 4073, 3993, 4127, 4064, 4050, 4112, 4062, 4075, 4141, 3905, 4111, 4095, 4128, 4086, 4126, 4103, 3966, 4069, 3892, 4108, 3963, 4069, 4090, 4004, 4161, 4158, 4050, 4021, 4107, 4025, 4016, 4070, 4153, 4093, 4098, 4079, 4059, 4174, 4095, 4005, 3935, 4135, 4051, 4042, 3998, 4085, 4079, 4024, 4095, 4032, 4075, 4115, 3984, 3980, 4024, 4096, 4052, 4072, 4039, 4066, 4015, 4000, 3996, 3934, 4112, 4027, 4013, 4016, 4082, 4115, 4060, 4020, 3985, 3864, 4095, 4026, 4140, 4093, 4083, 4075, 3999, 4067, 4099, 4031, 4041, 3932, 3966, 4023, 4016, 4015, 4038, 4094, 4073, 4083, 4029, 4076, 4037, 4030, 4032, 3950, 4094, 4156, 4223, 4023, 3929, 4096, 4128, 4047, 4061, 3997, 4104, 4052, 4003, 3995, 4133, 4078, 4076, 4060, 4006, 3939, 4059, 4048, 4093, 4124, 4066, 4034, 4081, 4113, 4068, 4019, 4070, 4014, 4112, 4030, 4005, 4100, 4076, 4085, 4010, 4111, 3992, 4165, 4077, 4144, 4020, 4107, 4018, 4106, 4089, 4062, 4090, 4105, 4105, 4021, 3974, 4067, 3979, 4027, 3990, 4123, 4111, 3962, 4100, 4012] Shutting down 256-bin OpenCL histogram...

TEST FAILED !!! Shutting down...

canhongluori commented 1 year ago

One interseting thing is that my computer is nnvidia RTX3060 laptop but When I try to run HistogramAMD the test passed

canhongluori commented 1 year ago

Besides the JOCLSample_2_0_SVM and HistogramNVIDIA ,the JOCLSimpleLWJGL also failed, here is the error message Exception in thread "AWT-EventQueue-0" java.lang.UnsatisfiedLinkError: no lwjgl64 in java.library.path: [C:\Program Files\Java\jdk-11.0.15\bin, C:\WINDOWS\Sun\Java\bin, C:\WINDOWS\system32, C:\WINDOWS, F:\Tecplot\Tecplot 360 EX 2020 R1\bin, C:\Program Files\Microsoft MPI\Bin\, C:\Program Files\Common Files\Oracle\Java\javapath, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp, C:\Windows\system32, C:\Windows, C:\Windows\System32\Wbem, C:\Windows\System32\WindowsPowerShell\v1.0\, C:\Windows\System32\OpenSSH\, C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common, C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR, C:\Program Files\NVIDIA Corporation\Nsight Compute 2022.1.1\, F:\MATLAB\R2022b\runtime\win64, F:\MATLAB\R2022b\bin, C:\WINDOWS\system32, C:\WINDOWS, C:\WINDOWS\System32\Wbem, C:\WINDOWS\System32\WindowsPowerShell\v1.0\, C:\WINDOWS\System32\OpenSSH\, F:\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.33.31629\bin\Hostx64\x64, C:\Program Files\CMake\bin, F:\Git\cmd, C:\Users\sibong\AppData\Local\Microsoft\WindowsApps, F:\Microsoft VS Code\bin, C:\Program Files\Java\jdk-11.0.15\bin, C:\Program Files\Java\jdk-11.0.15\jre\bin, C:\Users\sibong\AppData\Local\JetBrains\Toolbox\scripts, D:\apache-maven-3.8.6\bin, C:\Users\sibong\AppData\Local\Microsoft\WindowsApps, .] at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2662) at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:827) at java.base/java.lang.System.loadLibrary(System.java:1871) at org.lwjgl.Sys$1.run(Sys.java:72) at java.base/java.security.AccessController.doPrivileged(Native Method) at org.lwjgl.Sys.doLoadLibrary(Sys.java:66) at org.lwjgl.Sys.loadLibrary(Sys.java:87) at org.lwjgl.Sys.(Sys.java:117) at org.lwjgl.opengl.AWTGLCanvas.(AWTGLCanvas.java:87) at org.jocl.samples.JOCLSimpleLWJGL.createCanvas(JOCLSimpleLWJGL.java:339) at org.jocl.samples.JOCLSimpleLWJGL.(JOCLSimpleLWJGL.java:303) at org.jocl.samples.JOCLSimpleLWJGL$1.run(JOCLSimpleLWJGL.java:47) at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:313) at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:770) at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721) at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85) at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:740) at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

gpu commented 1 year ago

Regarding LWJGL: You have to add the right LWJGL DLL to your library path. That's how LWJGL worked back then. An aside: LWJGL also has OpenCL support. Maybe you'll find that easier to use.

Regarding the histogram: Look how cleverly designed the NVIDIA histogram kernel is: https://github.com/gpu/JOCLSamples/blob/master/src/main/resources/kernels/Histogram256.cl With constants. And inlined functions. And __attribute__ modifiers for the kernel. And some volatile sprinkled in here and there.

I have no idea what that kernel is doing, but apparently, it contains a bug. Depending on what you want to do, you could just ignore that, or create a pull request to fix it. I'm not going to try and debug code that was written by someone at NVIDIA.

canhongluori commented 1 year ago

At present, I want to use JOCL to do GPU parallel computing. Is there any problem

gpu commented 1 year ago

Can you compile and execute the "OpenCL 64-bin and 256-bin Histogram" sample from https://developer.nvidia.com/opencl and say whether it passes on your GPU?

canhongluori commented 1 year ago

I don't compile "OpenCL 64-bin and 256-bin Histogram" sample, Can you tell me how to complie it, use cmake-gui or vs2019

gpu commented 1 year ago

You can compile it with Visual Studio 2008 (or newer).

I'll attach the compiled binary here. This is an EXE file. You should not execute EXE files that someone just posts on the internet. You should not trust anybody. But I'll attach it here, and leave it to you.

Beyond that: When you want to use JOCL for GPU-based computations, then just do this. The fact that there may be a bug in a kernel that was taken from the NVIDIA samples does not matter at all.

Or to put it that way:

How should I fix this bug? Could this issue be marked as "resolved" when I just delete that sample? Just drop me a note, I can delete it at any time. I don't care.

oclHistogram_compiled_win64.zip

canhongluori commented 1 year ago

Sorry for late reply,I have execute the exe that you give to me, but the test about HistogramNVIDIA still failed

gpu commented 1 year ago

The output of that EXE should basically be the same as for the "HistogramNVIDIA" example.

If the output of the EXE says that the computation "failed", then this shows that there is a bug in the NVIDIA sample, and you should send a bug report to NVIDIA.

If the output of the EXE says that the test "passed", then something is wrong in the HistogramNVIDIA sample. But I don't have any way (and certainly no time) to analyze and fix that right now.

I'd say that you should ignore the fact that the Histogram sample does not work. If you can not ignore that, then I'd recommend you to not use JOCL, but https://www.lwjgl.org/ instead. It also has support for OpenCL, and you can do GPU-based computations with LWJGL. Look at their samples to get started: https://github.com/LWJGL/lwjgl3/tree/master/modules/samples/src/test/java/org/lwjgl/demo/opencl

canhongluori commented 1 year ago

Thank you for helping me. I checked. I really don't need the histogram production function in JOCL. There may be a problem with NVIDIA's OpenCL API. The executable you gave me didn't find any gaps in the success of the output test

gpu commented 1 year ago

The executable you gave me didn't find any gaps in the success of the output test

Does this mean that the EXE computes the correct result?

If so, I'll have to look whether they changed something in the kernel. The results should be equal for the EXE and for JOCL.

canhongluori commented 1 year ago

Here is the log clGetPlatformID... Get the Device info and select Device...

of Devices Available = 1

Using Device 0: NVIDIA GeForce RTX 3060 Laptop GPU

of Compute Units = 30

D:\chrome down load\oclHistogram_compiled_win64\oclHistogram.exe Starting...

Initializing data... Initializing OpenCL... Allocating OpenCL memory...

Initializing 64-bin OpenCL histogram... ...loading Histogram64.cl from file

So as you can see, the log did not tell me whether it is success or not

gpu commented 1 year ago

Oh dear. Apparently it tries to load the .CL files at runtime. (And... does not print an error message when it does not find them... 🙄 )

Can you add the attached files into the same directory as the .EXE file, and try it again?

oclHistogramKernels.zip

The output should be something like this:

[oclHistogram.exe] starting...

clGetPlatformID...
Get the Device info and select Device...
  # of Devices Available = 1
  Using Device 0: NVIDIA GeForce RTX 2070 SUPER
  # of Compute Units = 40
oclHistogram.exe Starting...

Initializing data...
Initializing OpenCL...
Allocating OpenCL memory...

Initializing 64-bin OpenCL histogram...
...loading Histogram64.cl from file
...creating histogram64 program
...building histogram64 program
...creating histogram64 kernels
...allocating internal histogram64 buffer

Writing ptx to separate file: Histogram64.ptx ...

Running 64-bin OpenCL histogram for 67108864 bytes...

Validating 64-bin histogram OpenCL results...
 ...reading back OpenCL results
 ...histogram64CPU()
 ...comparing the results
 ...64-bin histograms match

Shutting down 64-bin OpenCL histogram

Initializing 256-bin OpenCL histogram...
...loading Histogram256.cl
...creating histogram256 program
...building histogram256 program
...creating histogram256 kernels
...allocating internal histogram256 buffer

Writing ptx to separate file: Histogram256.ptx ...

Running 256-bin OpenCL histogram for 67108864 bytes...

Validating 256-bin histogram OpenCL results...
 ...reading back OpenCL results
 ...histogram256CPU()
 ...comparing the results
 ...256-bin histograms match

Shutting down 256-bin OpenCL histogram

Shutting down...
[oclHistogram.exe] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

The point is: When it says 'FAILED', then ... there's a bug in the sample code. When it says PASSED for you, then I'll have to check why it does not work with JOCL.

(In the best case, it's just a bug in the kernel, and I "only" have to replace https://github.com/gpu/JOCLSamples/blob/master/src/main/resources/kernels/Histogram256.cl with the latest one from the sample, maybe with minor adjustments. But if that's not the reason, I'd have to spend more time for analyzing that, and ... that has low priority for me right now...)

canhongluori commented 1 year ago

The test failed, oh No

gpu commented 1 year ago

That's good news for me. So the reason for the test failure is not in JOCL, but in the kernel that was provided by NVIDIA.

What that means for you ... I'm not sure. But I'm a bit curious now: When you run the test in JOCL (from the comment above ) multiple times, does it always print the same values in the

GPU: [4088, 4006, 3979, 4223, ...

output, or are these values different each time that you run it?

canhongluori commented 1 year ago

The value are same each time I run

gpu commented 1 year ago

One could try to dive deeper into this. Maybe some of the magic constants (WARP_COUNT, WARP_SIZE...) in the NVIDIA Kernel are not suitable for a "new" GPU like the RTX 3060? In any case: When the problem also appears in the .EXE that was compiled from the official NVIDIA sample, then this is not an issue of JOCL.