sbrunk / storch

GPU accelerated deep learning and numeric computing for Scala 3.
https://storch.dev
Apache License 2.0
113 stars 7 forks source link

Unsatisfied link error: jnicudart.dll #33

Open marcelluethi opened 1 year ago

marcelluethi commented 1 year ago

Hello

I've been playing around with storch for the past few days. It was a very pleasant experience so far - thanks a lot for this great effort.

Today I updated to the latest version and got the following error on Windows

Caused by: java.lang.UnsatisfiedLinkError: C:\Users\marce\.javacpp\cache\cuda-12.1-8.9-1.5.9-windows-x86_64.jar\org\bytedeco\cuda\windows-x86_64\jnicudart.dll: Can't find dependent libraries

I am using the following dependencies: `

//> using dep "dev.storch::core:0.0-eb0bfa1-SNAPSHOT"
//> using dep "dev.storch::vision:0.0-eb0bfa1-SNAPSHOT"
//> using dep "org.bytedeco:pytorch-platform-gpu:2.0.1-1.5.9"
//> using dep "org.bytedeco:cuda-platform-redist:12.1-8.9-1.5.9"

It works well on Linux, but not on Windows nor on WSL. Everything still works well with the version #89c4d5f, but starts breaking with #bfcaab4.

davoclavo commented 1 year ago

Hi @marcelluethi!

Thanks a lot for reporting this issue. I sadly don't have a windows machine handy to replicate the bug, but searching around I stumbled upon these relevant discussions. Perhaps there is something in them that could help sort out the problem.

Regarding the regression on bfcaab4 - that commit bumped the version of cuda to 12.1-8.9 so perhaps it is related with a mismatch on the installed drivers on your machine.

Could you perhaps share the output of nvidia-smi to take a look at what cuda version your system is using?

marcelluethi commented 1 year ago

Thanks for the prompt and helpful reply. My windows machine does not have a dedicated gpu and cuda is not installed. On my linux machine, it is. This turned out to the root of the problem. In previous versions (until #89c4d5f) I could have org.bytedeco:cuda-platform-redist:12.1-8.9-1.5.9as a dependency, even though I did not have the cuda libraries installed. In the latest version this seems no longer to be the case. But it works again if I remove this dependency.

Knowing this solves the issue for me. I just remove the dependency when I only have the CPU available.

Thanks again.

sbrunk commented 1 year ago

Thanks for your interest and apologies @marcelluethi, this is probably due to an issue that came with the update to cuda 12 https://github.com/bytedeco/javacpp-presets/issues/1376

I have a workaround that should get cuda working again, but didn't get to document it properly yet.

The workaround is to add cuda-platform in addition to cuda-platform-redist and the latest torch snapshot, as shown below:

//> using lib "dev.storch::core:0.0-3e0f9b1-SNAPSHOT"
//> using dep "org.bytedeco:pytorch-platform-gpu:2.0.1-1.5.9"
//> using dep "org.bytedeco:cuda-platform-redist:12.1-8.9-1.5.9"
//> using dep "org.bytedeco:cuda-platform:12.1-8.9-1.5.9"
marcelluethi commented 1 year ago

Unfortunately, adding cuda-platfrom in addition to cuda-platform-rdist does not resolve the issue. However, just commenting out the cuda part when no cuda is installed does. For me, this is work-around I can happily live with.

For a project that still is in a relatively early stage of development, it is already super useful and it's easy to get started. Very much appreciated.

sbrunk commented 1 year ago

Thanks for trying. I definitely need to do more testing on windows although having the same behavior on WSL suggests it might happen always if there's no cuda available. I'll try to investigate.

Just to clarify. Did you switch to the CPU only pytorch-platform build on your windows machine without a dedicated GPU or did pytorch-platform-gpu work (after removing cuda-platform-redist)?

BTW I just saw your scaltair project, which looks great. I was just creating a vega-lite heatmap from a Storch tensor manually and I think this could have been much easier with scaltair. Perhaps it even makes sense to have small integration use a Storch tensor as data-source at some point.

marcelluethi commented 1 year ago

It works by jsut removing the cuda-platform redist, but keeping pytorch-platform-gpu. While it works, it prints out the following warning

Warning: Loading nvfuser library failed with: error in LoadLibrary for nvfuser_codegen.dll. WinError 126: The specified module could not be found.
 (function LoadingNvfuserLibrary)

The warning disappears when switching to pytorch-platform.

Regarding scaltair: It is a small project that I started some time ago, mainly to scratch my own itch. I think the combination of vega lite and a small DSL can be very powerful and might make it possible to develop a plotting library in Scala that is relatively complete without becoming a huge maintanance nightmare. As I plan to do more work with deep learning this summer, I can use the opportunity and try to add an integration of storch tensors.