litongjava / whisper-cpp-server

whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++
MIT License
31 stars 4 forks source link

Illegal instruction (core dumped) #8

Open Jonny-Burkholder opened 2 months ago

Jonny-Burkholder commented 2 months ago

Running whisper-cpp-server docker image in a kubernetes cluster as a microservice. I'm attaching the models folder via extending the docker image like this:

FROM litongjava/whisper-cpp-server:1.0.0

ADD models/ models/

which seems to work pretty well. Then, in my kubernetes deployment, I'm translating the docker run command provided in the docs:

      containers:
        - name: whisper
          image: jonnyburkholder/whisper
          stdin: true
          tty: true
          command: ["/app/whisper_http_server_base_httplib"]
          args: ["-m", "$(MODEL_PATH)"]
          ports:
            - containerPort: 8080

This loads the model, then exits immediately with the following output:

 - dev:deployment/whisper: container whisper terminated with exit code 132
    - dev:pod/whisper-c994db857-vp7t8: container whisper terminated with exit code 132
      > [whisper-c994db857-vp7t8 whisper] whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-tiny.en-q5_1.bin'
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: loading model
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_vocab       = 51864
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_ctx   = 1500
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_state = 384
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_head  = 6
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_audio_layer = 4
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_ctx    = 448
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_state  = 384
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_head   = 6
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_text_layer  = 4
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_mels        = 80
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: ftype         = 9
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: qntvr         = 1
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: type          = 1 (tiny)
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: adding 1607 extra tokens
      > [whisper-c994db857-vp7t8 whisper] whisper_model_load: n_langs       = 99
 - dev:deployment/whisper failed. Error: container whisper terminated with exit code 132.

I've assumed that the container exits because the docker image is being run in the background and not actively doing work in the container. So to keep it from from exiting early, I modified the command to include writing to dev/null:

      containers:
        - name: whisper
          image: jonnyburkholder/whisper
          stdin: true
          tty: true
          command: ["/bin/sh", "-c"]
          args: ["/app/whisper_http_server_base_httplib -m $(MODEL_PATH); tail -f /dev/null"]
          ports:
            - containerPort: 8080

This keeps the container alive, but gives me the error "Illegal instruction (core dumped)". This output pops up from the logs every few seconds

[whisper] whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-tiny.en-q5_1.bin'
[whisper] whisper_model_load: loading model
[whisper] whisper_model_load: n_vocab       = 51864
[whisper] whisper_model_load: n_audio_ctx   = 1500
[whisper] whisper_model_load: n_audio_state = 384
[whisper] whisper_model_load: n_audio_head  = 6
[whisper] whisper_model_load: n_audio_layer = 4
[whisper] whisper_model_load: n_text_ctx    = 448
[whisper] whisper_model_load: n_text_state  = 384
[whisper] whisper_model_load: n_text_head   = 6
[whisper] whisper_model_load: n_text_layer  = 4
[whisper] whisper_model_load: n_mels        = 80
[whisper] whisper_model_load: ftype         = 9
[whisper] whisper_model_load: qntvr         = 1
[whisper] whisper_model_load: type          = 1 (tiny)
[whisper] whisper_model_load: adding 1607 extra tokens
[whisper] whisper_model_load: n_langs       = 99
[whisper] Illegal instruction (core dumped)

I'm using the ggml-tiny.en-q5_1 model. I've tested it with the docker run command on the command line, and it works fine from there. I'm having trouble understanding why it isn't working in my cluster, however. Any insight into why this doesn't work would be appreciated. Thanks!

litongjava commented 2 months ago

The issue with your container exiting and the "Illegal instruction (core dumped)" error you encountered is likely due to the CPUs in some nodes of your Kubernetes cluster not supporting the instruction set with which the Docker image was compiled. This situation usually occurs when the Docker image is built on a machine optimized for a specific CPU instruction set (such as AVX2), while some CPUs in the cluster do not support these instructions.

The model you are using is ggml-tiny.en-q5_1.bin. I have not tested this model. Please provide the address of this model so I can test it.You might want to try the following two commands. The model is already packaged in the container:

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-base-en

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-large-v3

I am not aware of the CPU architecture of your target platform. If it is an issue with the CPU architecture, you will need to recompile and build the Docker image on the target platform.

Jonny-Burkholder commented 2 months ago

You're right, that's exactly what it is. Thanks!

Jonny-Burkholder commented 2 months ago

The model is one of the ones available on HuggingFace from ggerganov. My kubernetes environment is pretty resource limited, and I'm planning to use this for voice commands, so I wanted to use the smallest model I could, then implement something like a Levenshtein distance algorithm to match to the commands in case of a mis-transcribed word.

Here's a link to the model: https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-tiny.en-q5_1.bin

litongjava commented 2 months ago

here's my cpu info

root@ping-Inspiron-3458:~/code/whisper-cpp-server# lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      39 bits physical, 48 bits virtual
CPU(s):                             4
On-line CPU(s) list:                0-3
Thread(s) per core:                 2
Core(s) per socket:                 2
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              69
Model name:                         Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Stepping:                           1
CPU MHz:                            2394.294
CPU max MHz:                        2700.0000
CPU min MHz:                        800.0000
BogoMIPS:                           4788.58
Virtualization:                     VT-x
L1d cache:                          64 KiB
L1i cache:                          64 KiB
L2 cache:                           512 KiB
L3 cache:                           3 MiB
NUMA node0 CPU(s):                  0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        KVM: Mitigation: VMX disabled
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Unknown: No mitigations
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Mitigation; Microcode
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
                                     pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt t
                                    sc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adju
                                    st bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d

I build a new docker image for you. it wroks well in my test environment.please test

docker run -dit --name whisper-server -p 8080:8080 litongjava/whisper-cpp-server:1.0.0-tiny.en-q5_1
root@ping-Inspiron-3458:~/code/whisper-cpp-server# docker logs -f 4df89f6238da
whisper_init_from_file_with_params_no_state: loading model from '/app/models/ggml-tiny.en-q5_1.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 1
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =    31.57 MB
whisper_model_load: model size    =   31.57 MB
whisper_init_state: kv self size  =    8.26 MB
whisper_init_state: kv cross size =    9.22 MB
whisper_init_state: compute buffer (conv)   =   13.32 MB
whisper_init_state: compute buffer (encode) =   85.66 MB
whisper_init_state: compute buffer (cross)  =    4.01 MB
whisper_init_state: compute buffer (decode) =   96.02 MB

whisper service listening at http://0.0.0.0:8080

24-04-25 12:14:10.787: Received filename: jfk.wav 
24-04-25 12:14:10.787: audio_format:wav 
Successfully loaded jfk.wav

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | 

run: WARNING: model is not multilingual, ignoring language and translation options
run: processing 'jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

Running whisper.cpp inference on jfk.wav

[00:00:00.000 --> 00:00:07.740]   And so my fellow Americans ask not what your country can do for you
[00:00:07.740 --> 00:00:10.580]   ask what you can do for your country.