Closed petebankhead closed 9 months ago
I think this now works (at least on my computers).
You should:
gradlew clean jpackage
)gradlew clean build --refresh-dependences
because of snapshots)@Rylern @alanocallaghan @finglis Please check if you can, while I update the docs here.
The GPU is still not detected on my computer.
When I install PyTorch, I get this warning:
PyTorch works fine for me and finds my CUDA, but then it found it fine before as well.
For TensorFlow, confusingly, I get an error, and then it claims to be available. Stacktrace:
11:25:50.464 [JavaFX Application Thread] [ERROR] qupath.ext.djl.ui.DjlEngineCommand - Error updating engine version: Failed to download Tensorflow native library
java.lang.IllegalStateException: Failed to download Tensorflow native library
at ai.djl.tensorflow.engine.javacpp.LibUtils.downloadTensorFlow(LibUtils.java:207)
at ai.djl.tensorflow.engine.javacpp.LibUtils.findLibraryInClasspath(LibUtils.java:91)
at ai.djl.tensorflow.engine.javacpp.LibUtils.getLibName(LibUtils.java:66)
at ai.djl.tensorflow.engine.TfEngine.toString(TfEngine.java:177)
at qupath.ext.djl.ui.DjlEngineCommand.updateVersionFromStatus(DjlEngineCommand.java:271)
at qupath.ext.djl.ui.DjlEngineCommand.lambda$init$6(DjlEngineCommand.java:236)
at com.sun.javafx.binding.ExpressionHelper$Generic.fireValueChangedEvent(ExpressionHelper.java:360)
at com.sun.javafx.binding.ExpressionHelper.fireValueChangedEvent(ExpressionHelper.java:80)
at javafx.beans.property.ObjectPropertyBase.fireValueChangedEvent(ObjectPropertyBase.java:106)
at javafx.beans.property.ObjectPropertyBase.markInvalid(ObjectPropertyBase.java:113)
at javafx.beans.property.ObjectPropertyBase.set(ObjectPropertyBase.java:147)
at qupath.ext.djl.ui.DjlEngineCommand.updateStatus(DjlEngineCommand.java:363)
at qupath.ext.djl.ui.DjlEngineCommand.lambda$updateStatus$8(DjlEngineCommand.java:365)
at com.sun.javafx.application.PlatformImpl.lambda$runLater$10(PlatformImpl.java:456)
at java.base/java.security.AccessController.doPrivileged(Unknown Source)
at com.sun.javafx.application.PlatformImpl.lambda$runLater$11(PlatformImpl.java:455)
at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
at com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method)
at com.sun.glass.ui.gtk.GtkApplication.lambda$runLoop$11(GtkApplication.java:316)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Offline model is enabled.
at ai.djl.util.Utils.openUrl(Utils.java:486)
at ai.djl.util.Utils.openUrl(Utils.java:472)
at ai.djl.tensorflow.engine.javacpp.LibUtils.downloadTensorFlow(LibUtils.java:176)
... 19 common frames omitted
I need more info... not sure which platforms you're running on, or how you're assessing that CUDA was found. CUDA can be detected and show in the Manage engines dialog, even when it's completely incompatible with both engines.
I can replicate the TensorFlow problem on Windows - looking into it now.
I need more info... not sure which platforms you're running on, or how you're assessing that CUDA was found. CUDA can be detected and show in the Manage engines dialog, even when it's completely incompatible with both engines.
Running on Linux. Knowing that CUDA was found because this laptop can't do 70 tiles/s on the CPU
Ignore what I said, it was picking up my existing DJL download. If I make a launcher script just pointing to a conda environment, DJL just ignores it and downloads pytorch to the normal place. If I also specify the path to torch, then I get
11:59:49.658 [JavaFX Application Thread] [INFO ] ai.djl.pytorch.jni.LibUtils - Downloading jni https://publish.djl.ai/pytorch/1.13.1/jnilib/0.24.0/linux-x86_64/cu123-precxx11/libdjl_torch.so to cache ...
11:59:49.659 [JavaFX Application Thread] [ERROR] q.ext.wsinfer.ui.PytorchManager - Cannot download jni files: https://publish.djl.ai/pytorch/1.13.1/jnilib/0.24.0/linux-x86_64/cu123-precxx11/libdjl_torch.so
ai.djl.engine.EngineException: Cannot download jni files: https://publish.djl.ai/pytorch/1.13.1/jnilib/0.24.0/linux-x86_64/cu123-precxx11/libdjl_torch.so
I can't get it to pick up anything bar its preferred version/location, but then its preferred location works fine on my box, so...
Yeah, I think if it works for its preferred download & location that's enough. Maybe we should remove the other option, although I think I've seen it work at some point in the past.
The TensorFlow thing is a bit maddening, but I think I'm getting closer.
Basically the Engine
can be valid but simply calling toString()
on it is enough to prompt a download here.
This is because it looks for the library name based upon what it expects to have to match with the current CUDA, even if that doesn't exist and isn't actually the library name of the engine. Rather, it finds a placeholder here and so wants to attempt a download, but can't because we're in offline mode.
The wrong CUDA/library name is corrected at the download stage (i.e. it recognizes that it should fall back to cpu), but only if it's allowed to proceed (i.e. we're not in offline mode).
So basically it looks like we'd need to turn off offline mode in order for it to start figuring out what to download, then realize that it doesn't need to download anything.
Not sure how to fix it satisfyingly... but the conclusion is that it should work as long as there is either 1) no CUDA, or 2) a compatible CUDA found, or 3) we're not enforcing offline.
Add a new command to generate a launch script that can set environment variables and system properties.