Use GPU / system onnxruntime for inference on arch linux

luc-caspar commented 2 months ago

Following the instructions in the readme, I have compiled the plugin from source and installed the necessary files in the required locations. The plugin is recognized by OBS and I can add it as a filter for video or audio sources. However, whenever I do so the logs indicate that both models (transcription and translation) are using the CPU for inference. Given the limited compute capacities of my computer, this means that I get a new line of caption every ~15 seconds. Therefore, I was wondering if there is a way to force the plugin to use the GPU instead. I have tried to make use of the system's onnxruntime as a workaround, but the cmake configuration step keeps on failing, even when manually providing the path to the onnxruntime include/lib folder. Any help with this issue would be greatly appreciated.

royshil commented 2 months ago

which OS are you on? do you have a GPU?

luc-caspar commented 2 months ago

I am using Arch Linux. And although it is not a powerful one, I do have a GPU, with the nvidia drivers installed.

jitingcn commented 2 months ago

Regarding the compilation options for Linux, GGML_CUDA=1 build flag is missing lead whisper.cpp not being built with GPU support. I encountered a linking error after attempt add the parameter and recompile.

/usr/bin/ld: Whispercpp_Build-prefix/lib/static/libwhisper.a(ggml-cuda.cu.o): warning: relocation against `_ZNSt3mapISt5arrayIfLm16EE24ggml_backend_buffer_typeSt4lessIS1_ESaISt4pairIKS1_S2_EEED1Ev' in read-only section `.text'
/usr/bin/ld: Whispercpp_Build-prefix/lib/static/libwhisper.a(ggml-cuda.cu.o): relocation R_X86_64_PC32 against symbol `stdout@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC

locaal-ai / obs-localvocal

Use GPU / system onnxruntime for inference on arch linux #131