qupath / qupath-extension-djl

A QuPath extension for working with Deep Java Library (https://djl.ai)
Apache License 2.0
5 stars 4 forks source link

New command to generate launch script #13

Closed petebankhead closed 9 months ago

petebankhead commented 9 months ago

Add a new command to generate a launch script that can set environment variables and system properties.

petebankhead commented 9 months ago

I think this now works (at least on my computers).

You should:

@Rylern @alanocallaghan @finglis Please check if you can, while I update the docs here.

Rylern commented 9 months ago

The GPU is still not detected on my computer.

When I install PyTorch, I get this warning: image

alanocallaghan commented 9 months ago

PyTorch works fine for me and finds my CUDA, but then it found it fine before as well.

For TensorFlow, confusingly, I get an error, and then it claims to be available. Stacktrace:

11:25:50.464 [JavaFX Application Thread] [ERROR] qupath.ext.djl.ui.DjlEngineCommand - Error updating engine version: Failed to download Tensorflow native library
java.lang.IllegalStateException: Failed to download Tensorflow native library
    at ai.djl.tensorflow.engine.javacpp.LibUtils.downloadTensorFlow(LibUtils.java:207)
    at ai.djl.tensorflow.engine.javacpp.LibUtils.findLibraryInClasspath(LibUtils.java:91)
    at ai.djl.tensorflow.engine.javacpp.LibUtils.getLibName(LibUtils.java:66)
    at ai.djl.tensorflow.engine.TfEngine.toString(TfEngine.java:177)
    at qupath.ext.djl.ui.DjlEngineCommand.updateVersionFromStatus(DjlEngineCommand.java:271)
    at qupath.ext.djl.ui.DjlEngineCommand.lambda$init$6(DjlEngineCommand.java:236)
    at com.sun.javafx.binding.ExpressionHelper$Generic.fireValueChangedEvent(ExpressionHelper.java:360)
    at com.sun.javafx.binding.ExpressionHelper.fireValueChangedEvent(ExpressionHelper.java:80)
    at javafx.beans.property.ObjectPropertyBase.fireValueChangedEvent(ObjectPropertyBase.java:106)
    at javafx.beans.property.ObjectPropertyBase.markInvalid(ObjectPropertyBase.java:113)
    at javafx.beans.property.ObjectPropertyBase.set(ObjectPropertyBase.java:147)
    at qupath.ext.djl.ui.DjlEngineCommand.updateStatus(DjlEngineCommand.java:363)
    at qupath.ext.djl.ui.DjlEngineCommand.lambda$updateStatus$8(DjlEngineCommand.java:365)
    at com.sun.javafx.application.PlatformImpl.lambda$runLater$10(PlatformImpl.java:456)
    at java.base/java.security.AccessController.doPrivileged(Unknown Source)
    at com.sun.javafx.application.PlatformImpl.lambda$runLater$11(PlatformImpl.java:455)
    at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
    at com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method)
    at com.sun.glass.ui.gtk.GtkApplication.lambda$runLoop$11(GtkApplication.java:316)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Offline model is enabled.
    at ai.djl.util.Utils.openUrl(Utils.java:486)
    at ai.djl.util.Utils.openUrl(Utils.java:472)
    at ai.djl.tensorflow.engine.javacpp.LibUtils.downloadTensorFlow(LibUtils.java:176)
    ... 19 common frames omitted
petebankhead commented 9 months ago

I need more info... not sure which platforms you're running on, or how you're assessing that CUDA was found. CUDA can be detected and show in the Manage engines dialog, even when it's completely incompatible with both engines.

I can replicate the TensorFlow problem on Windows - looking into it now.

alanocallaghan commented 9 months ago

I need more info... not sure which platforms you're running on, or how you're assessing that CUDA was found. CUDA can be detected and show in the Manage engines dialog, even when it's completely incompatible with both engines.

Running on Linux. Knowing that CUDA was found because this laptop can't do 70 tiles/s on the CPU

Ignore what I said, it was picking up my existing DJL download. If I make a launcher script just pointing to a conda environment, DJL just ignores it and downloads pytorch to the normal place. If I also specify the path to torch, then I get

11:59:49.658 [JavaFX Application Thread] [INFO ] ai.djl.pytorch.jni.LibUtils - Downloading jni https://publish.djl.ai/pytorch/1.13.1/jnilib/0.24.0/linux-x86_64/cu123-precxx11/libdjl_torch.so to cache ...
11:59:49.659 [JavaFX Application Thread] [ERROR] q.ext.wsinfer.ui.PytorchManager - Cannot download jni files: https://publish.djl.ai/pytorch/1.13.1/jnilib/0.24.0/linux-x86_64/cu123-precxx11/libdjl_torch.so
ai.djl.engine.EngineException: Cannot download jni files: https://publish.djl.ai/pytorch/1.13.1/jnilib/0.24.0/linux-x86_64/cu123-precxx11/libdjl_torch.so

I can't get it to pick up anything bar its preferred version/location, but then its preferred location works fine on my box, so...

petebankhead commented 9 months ago

Yeah, I think if it works for its preferred download & location that's enough. Maybe we should remove the other option, although I think I've seen it work at some point in the past.

The TensorFlow thing is a bit maddening, but I think I'm getting closer.

Basically the Engine can be valid but simply calling toString() on it is enough to prompt a download here.

This is because it looks for the library name based upon what it expects to have to match with the current CUDA, even if that doesn't exist and isn't actually the library name of the engine. Rather, it finds a placeholder here and so wants to attempt a download, but can't because we're in offline mode.

The wrong CUDA/library name is corrected at the download stage (i.e. it recognizes that it should fall back to cpu), but only if it's allowed to proceed (i.e. we're not in offline mode).

So basically it looks like we'd need to turn off offline mode in order for it to start figuring out what to download, then realize that it doesn't need to download anything.

Not sure how to fix it satisfyingly... but the conclusion is that it should work as long as there is either 1) no CUDA, or 2) a compatible CUDA found, or 3) we're not enforcing offline.