nextcloud / recognize

👁 👂 Smart media tagging for Nextcloud: recognizes faces, objects, landscapes, music genres
https://apps.nextcloud.com/apps/recognize
GNU Affero General Public License v3.0
519 stars 42 forks source link

Add Intel GPU support #1014

Open cromefire opened 8 months ago

cromefire commented 8 months ago

Describe the feature you'd like to request

As many probably have a server with something like an old Intel processor with an iGPU (at least I have) it'd be cool to have support for those iGPUs (and also support for the ARC A series for free) to take some burden off the CPU.

Describe the solution you'd like

To work with Intel on tensor flow, you need to use the https://github.com/intel/intel-extension-for-tensorflow package and normally it should just pick up the Intel GPU as an accelerator device if everything is installed correctly. iGPUs aren't quite the first class citizen, but they should work reasonably well (see here) and support seems to go back to Skylake.

It'll probably be needs to be custom built anyway to support certain GPU families, so there's probably not much you need to do from your side, other than some guidance on how to build and where to install the result, to verify that it picks up accelerators other than NVIDIA and maybe to renaming of the NVIDIA GPU support toggle so something like GPU support.

AMD also would be nice to have for a full support, but they don't support their GPUs as an extension to tensorflow, but only as a replacement package, so that's probably a lot harder to realize.

Describe alternatives you've considered

Running it on the CPU. It's passable, but of course it isn't particularly fast, especially if you throw in a bunch of lange images at once and chews on quite some resources that the rest of the system could use.

Depends on: https://github.com/tensorflow/tfjs/issues/8040

github-actions[bot] commented 8 months ago

Hello :wave:

Thank you for taking the time to open this issue with recognize. I know it's frustrating when software causes problems. You have made the right choice to come here and open an issue to make sure your problem gets looked at and if possible solved. I try to answer all issues and if possible fix all bugs here, but it sometimes takes a while until I get to it. Until then, please be patient. Note also that GitHub is a place where people meet to make software better together. Nobody here is under any obligation to help you, solve your problems or deliver on any expectations or demands you may have, but if enough people come together we can collaborate to make this software better. For everyone. Thus, if you can, you could also look at other issues to see whether you can help other people with your knowledge and experience. If you have coding experience it would also be awesome if you could step up to dive into the code and try to fix the odd bug yourself. Everyone will be thankful for extra helping hands! One last word: If you feel, at any point, like you need to vent, this is not the place for it; you can go to the forum, to twitter or somewhere else. But this is a technical issue tracker, so please make sure to focus on the tech and keep your opinions to yourself. (Also see our Code of Conduct. Really.)

I look forward to working with you on this issue Cheers :blue_heart:

bugsyb commented 8 months ago

@marcelklehr , if above request is down to injecting Intel TensorFlow into the package, happy to try it with the build I've prepped (https://github.com/bugsyb/recognize_docker).

Since I've fresh knowledge of these it will be easier to approach it now instead of in a 2-3 months, if possible. At the same time I don't have a unit where I could run tests hence would need to run it blindly. @cromefire, if you could prep container with access to your GPU, Debian based and would find out how to install all dependencies it will help with quick start.

cromefire commented 8 months ago

if you could prep container with access to your GPU, Debian based and would find out how to install all dependencies it will help with quick start.

Sure it should be pretty easy, don't have time for it today I think, but I might be able to take a look at getting it up and running next week, since Intel stuff usually runs quite well in docker (AMD and Intel don't need any specific runtimes compared to NVIDIA). Also there's a tensorflow base image for Intel I think, so maybe just swapping the NVIDIA base image out already works.

Is there any way to manually test it other than just setting it up with nextcloud and uploading an image?

cromefire commented 8 months ago

The end of the investigation is it works fine in docker and worked immediately:

~$ sudo docker run --rm --device /dev/dri -v /dev/dri/by-path:/dev/dri/by-path -it local/nextcloud:27.1.3-apache-intel-custom /usr/bin/python -c "import tensorflow as tf;print(tf.config.list_physical_devices('XPU'))"
2023-10-28 23:16:24.855397: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-28 23:16:28.764013: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-28 23:16:28.845009: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-28 23:16:41.633581: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-10-28 23:16:54.954401: I itex/core/wrapper/itex_cpu_wrapper.cc:52] Intel Extension for Tensorflow* AVX2 CPU backend is loaded.
2023-10-28 23:18:00.466358: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2023-10-28 23:18:00.651197: W itex/core/ops/op_init.cc:58] Op: _QuantizedMaxPool3D is already registered in Tensorflow
2023-10-28 23:18:01.077444: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2023-10-28 23:18:01.077490: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
[PhysicalDevice(name='/physical_device:XPU:0', device_type='XPU')]

But as someone decided to use TensorFlow.js for this project, and it doesn't yet support anything but NVIDIA:

> const tf = require('@tensorflow/tfjs-node-gpu')
2023-10-28 22:15:33.044215: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-10-28 22:15:33.044232: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-10-28 22:15:33.068766: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-28 22:15:33.070460: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-10-28 22:15:33.070471: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-10-28 22:15:33.070492: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (4dd675eac5ca): /proc/driver/nvidia/version does not exist
undefined
> tf.backend()
NodeJSKernelBackend {
  binding: {},
  isGPUPackage: true,
  isUsingGpuDevice: false,
  tensorMap: DataStorage {
    backend: [Circular],
    dataMover: Engine {
      ENV: [Environment],
      registry: [Object],
      registryFactory: [Object],
      pendingBackendInitId: 0,
      state: [EngineState],
      backendName: 'tensorflow',
      backendInstance: [Circular],
      profiler: [Profiler]
    },
    data: WeakMap { <items unknown> },
    dataIdsCount: 0
  }
}

and I lack skills with C and node-gyp to call the C API to fix it: tensorflow/tfjs#8040, so it doesn't help at all right now. Eventhough for someone who knows their way around it'd probably be reasonably easy, as it's only a single API call.

Also the nvidia-tensor-based/custom-addons/Dockerfile seems to be quite broken, as it doesn't properly build recognize (no nodejs installed, easy to fix; no php composer installed, harder to fix as it has dependency issues, at least on the ubuntu based intel container; also doesn't actually copy recognize to the target container).

bugsyb commented 8 months ago

Whilst I'm not the code author, knowing there's switch for CPU/GPU and on that basis it makes choice to try GPU, as well as knowing when it fails on GPU, it seems to try with CPU - these are the two places in the code I'd focus on to make another deviation to check on XPU/Intel portion?

In terms of the Dockerfile, seems like you're referring to: https://github.com/bugsyb/recognize_docker/tree/main/nvidia-tensor-based/custom-addons Should this be the case, raise the Issue under that and report errors, etc.

cromefire commented 8 months ago

fails on GPU, it seems to try with CPU - these are the two places in the code I'd focus on to make another deviation to check on XPU/Intel portion?

Yeah it just doesn't find non-CUDA devices obviously, because it can't load the python driver and it also has no option to load the C/libtensorflow driver. I know exactly which function has to be called, I just lack the skills with the NodeJS-to-C interface to implement it.

bugsyb commented 8 months ago

Trying to help, it doesn't look like it would require that much of a knowledge as heavy lifting is done already.

Have a look at output of this search: https://github.com/search?q=repo%3Anextcloud%2Frecognize+RECOGNIZE_GPU&type=code

Majority of the output is where trust is change would need to be made:

    try {
        if (process.env.RECOGNIZE_GPU === 'true') {
            tf = require('@tensorflow/tfjs-node-gpu')
        } else {
            tf = require('@tensorflow/tfjs-node')

For just PoC, I'd probably replace the tfjs-node-gpu with the intel one and see if it just does work (most probably it will given above discussion).

Other point which most probably will arise is amount of RAM available for XPU, as current model requires shy of 4GB. One of systems I've tested it against had 2GB on GPU and it was failing to use GPU due to memory issue.

Should you succeed with simple replacement as above, it might be then worth the work to add Admin level settings and adjust rest of the code.

I can't help much with testing as don't have access to appropriate system, unless NUC 7th gen has it.

If simple switch wouldn't work then it will require amendments around where GPU is searched for and have an option to accept XPU.

cromefire commented 8 months ago

I'd probably replace the tfjs-node-gpu with the intel one

Well that's what I'm trying to say, there is no Intel one... @tensorflow/tfjs-node-gpu first needs to support PluggableDevices, before we can do anything on the recognize side, see tensorflow/tfjs#8040. Yes it works in tensorflow Python and libtensorflow, so getting it to work in tfjs-node-gpu, just requires them to make the right call to libtensorflow, but it's beyond me to implement that. Once we're fully in JavaScript land I can probably do it myself if I find all the right places.

SoTHISIsFitness commented 3 months ago

was there ever any update here? maybe someone that knows the missing piece to help button this up? (that person is not me, sorry, just poking to see if anyone's thought about this in the last 4 or 5 months)

cromefire commented 3 months ago

No one has taken up the blocking issue, so no until that's done, nothing can be done really... And for that we need someone with node-gyp knowledge.

bendschs commented 3 months ago

+1

lukemovement commented 1 month ago

No one has taken up the blocking issue, so no until that's done, nothing can be done really... And for that we need someone with node-gyp knowledge.

It might be worth putting this on the TensorFlow GitHub, as opposed to the TFJS GitHub, given this is in regards to embedded languages?

cromefire commented 1 month ago

It might be worth putting this on the TensorFlow GitHub, as opposed to the TFJS GitHub, given this is in regards to embedded languages?

The C API in libtensorflow for embedded languages is already there (as an experimental API). The JS API just doesn't expose this API for use by clients.

Further reading material: https://www.tensorflow.org/install/gpu_plugins https://intel.github.io/intel-extension-for-tensorflow/latest/docs/install/install_for_cpp.html#load

The function in question is TF_LoadPluggableDeviceLibrary from tensorflow/c/c_api_experimental.h.

marcelklehr commented 1 month ago

Ideally, this could be tackled when moving the classification to a docker-based ExApp as suggested here: https://github.com/nextcloud/recognize/issues/73 In docker we could then use python which makes everything much easier. If you would like to see this, you can upvote via github reactions over there as well.