Closed hanserasmus closed 9 months ago
Hi @hanserasmus, are you using a musl-based system?
The logging would be fixed in the next release of the app, so for now can you turn on debug logging in NC (log level), try to transcribe the file and post the log here?
Thanks.
@kyteinsky Thank you for the response. I am too stupid to know what a musl-based system is, and am confident enough to admit I am too stupid to know :-) So the answer is, I don't know.
It's an LXC container based on Ubuntu 22.04 built on Promox 7.4. The same file can be transcribed successfully using the whisper command line, I have tested that part at least. Not sure if I am perhaps missing some PHP plugins? This is a very minimal setup of NC to test with, so I have only done the bare minimum to get it up and running. Will add debug logs very soon.
@kyteinsky Here is the debug logs:
{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:02+00:00","remoteAddr":"","user":"--","app":"cron","method":"","url":"--","message":"Run OC\\SpeechToText\\TranscriptionJob job with ID 53","userAgent":"--","version":"28.0.1.1","data":{"app":"cron"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:02+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n 0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../node_modules/ffmpeg-static/ffmpeg',\n 1 => '-i',\n 2 => '/var/www/nextcloud/data/admin/files/File1.ogg',\n 3 => '-ar',\n 4 => 16000,\n 5 => '-ac',\n 6 => 1,\n 7 => '-af',\n 8 => 'silenceremove=window=1:detection=peak:stop_periods=-1:stop_silence=7:start_threshold=-70dB:stop_threshold=-70dB',\n 9 => '-c:a',\n 10 => 'pcm_s16le',\n 11 => '-threads',\n 12 => 4,\n 13 => '-y',\n 14 => '/tmp/oc_tmp_3uDeAB-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:03+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n 0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../bin/main',\n 1 => '-m',\n 2 => '../../models/medium',\n 3 => '-t',\n 4 => 4,\n 5 => '-l',\n 6 => 'auto',\n 7 => '--no-timestamps',\n 8 => '/tmp/oc_tmp_3uDeAB-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":2,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"whisper_init_from_file_no_state: loading model from '../../models/medium'\nwhisper_model_load: loading model\nwhisper_model_load: n_vocab = 51865\nwhisper_model_load: n_audio_ctx = 1500\nwhisper_model_load: n_audio_state = 1024\nwhisper_model_load: n_audio_head = 16\nwhisper_model_load: n_audio_layer = 24\nwhisper_model_load: n_text_ctx = 448\nwhisper_model_load: n_text_state = 1024\nwhisper_model_load: n_text_head = 16\nwhisper_model_load: n_text_layer = 24\nwhisper_model_load: n_mels = 80\nwhisper_model_load: f16 = 1\nwhisper_model_load: type = 4\nwhisper_model_load: mem required = 1725.00 MB (+ 43.00 MB per decoder)\nwhisper_model_load: adding 1608 extra tokens\nwhisper_model_load: model ctx = 1462.35 MB\n","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":2,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Transcription failed with: Whisper process failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","line":30,"function":"transcribe","class":"OCA\\SttWhisper\\Service\\SpeechToTextService","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Service/SpeechToTextService.php","Line":88,"message":"Transcription failed with: Whisper process failed","exception":{},"CustomMessage":"Transcription failed with: Whisper process failed"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":1,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Transcription failed with: Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","Line":33,"message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","exception":{},"CustomMessage":"SpeechToText transcription using provider Whisper Speech-To-Text failed"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":2,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Transcription of file 175 failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Could not transcribe file","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","Line":135,"message":"Transcription of file 175 failed","exception":{},"CustomMessage":"Transcription of file 175 failed"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":3,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"stt_helper","method":"","url":"--","message":"Transcript generation failed: Could not transcribe file","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_helper"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"cron","method":"","url":"--","message":"Finished OC\\SpeechToText\\TranscriptionJob job with ID 53 in 2 seconds","userAgent":"--","version":"28.0.1.1","data":{"app":"cron"}}
I should add that my whisper was installed via pip3, but it is available on the cli as simply whisper
(if that helps?).
Since you're using a off-the-shelf Ubuntu vm, it is not musl based. You can check it with ldd /bin/sh
, if it writes musl somewhere, it is musl. Now you know :)
This app comes bundled with a whisper binary so your local install would not be used.
The logs have been truncated by NC logger at the end but it seems that the model file is missing most probably. The desired model file can be downloaded by this occ command: occ stt_whisper:download-models [model-name]
, where model-name can be small, medium or large.
We can try running the binary manually if this does not help to get the complete output of the whisper command.
HiWill check the musl later. I did load the medium model as per installation instructions right during the install. Is musl an issue?Sent from my iPhoneOn 04 Jan 2024, at 16:33, Anupam Kumar @.***> wrote: Since you're using a off-the-shelf Ubuntu vm, it is not musl based. You can check it with ldd /bin/sh, if it writes musl somewhere, it is musl. Now you know :) This app comes bundled with a whisper binary so your local install would not be used. The logs have been truncated by NC logger at the end but it seems that the model file is missing most probably. The desired model file can be downloaded by this occ command: occ stt_whisper:download-models [model-name], where model-name can be small, medium or large.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@kyteinsky I ensured that the medium model is installed, it was, but just did it anyways. Then I scheduled the transcription again, and it failed again, with these log entries:
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":0,"time":"2024-01-04T21:15:01+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n 0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../node_modules/ffmpeg-static/ffmpeg',\n 1 => '-i',\n 2 => '/var/www/nextcloud/data/admin/files/File1.ogg',\n 3 => '-ar',\n 4 => 16000,\n 5 => '-ac',\n 6 => 1,\n 7 => '-af',\n 8 => 'silenceremove=window=1:detection=peak:stop_periods=-1:stop_silence=7:start_threshold=-70dB:stop_threshold=-70dB',\n 9 => '-c:a',\n 10 => 'pcm_s16le',\n 11 => '-threads',\n 12 => 4,\n 13 => '-y',\n 14 => '/tmp/oc_tmp_MZWOP0-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":0,"time":"2024-01-04T21:15:02+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n 0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../bin/main',\n 1 => '-m',\n 2 => '../../models/medium',\n 3 => '-t',\n 4 => 4,\n 5 => '-l',\n 6 => 'auto',\n 7 => '--no-timestamps',\n 8 => '/tmp/oc_tmp_MZWOP0-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":2,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"whisper_init_from_file_no_state: loading model from '../../models/medium'\nwhisper_model_load: loading model\nwhisper_model_load: n_vocab = 51865\nwhisper_model_load: n_audio_ctx = 1500\nwhisper_model_load: n_audio_state = 1024\nwhisper_model_load: n_audio_head = 16\nwhisper_model_load: n_audio_layer = 24\nwhisper_model_load: n_text_ctx = 448\nwhisper_model_load: n_text_state = 1024\nwhisper_model_load: n_text_head = 16\nwhisper_model_load: n_text_layer = 24\nwhisper_model_load: n_mels = 80\nwhisper_model_load: f16 = 1\nwhisper_model_load: type = 4\nwhisper_model_load: mem required = 1725.00 MB (+ 43.00 MB per decoder)\nwhisper_model_load: adding 1608 extra tokens\nwhisper_model_load: model ctx = 1462.35 MB\n","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":2,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Transcription failed with: Whisper process failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","line":30,"function":"transcribe","class":"OCA\\SttWhisper\\Service\\SpeechToTextService","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Service/SpeechToTextService.php","Line":88,"message":"Transcription failed with: Whisper process failed","exception":{},"CustomMessage":"Transcription failed with: Whisper process failed"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":1,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Transcription failed with: Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","Line":33,"message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","exception":{},"CustomMessage":"SpeechToText transcription using provider Whisper Speech-To-Text failed"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":2,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Transcription of file 175 failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Could not transcribe file","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","Line":135,"message":"Transcription of file 175 failed","exception":{},"CustomMessage":"Transcription of file 175 failed"}}
Could it be that you are lacking enough RAM and the process is killed because of OOM?
There is no log entries in /var/log/syslog about OOM killer, I kept a realtime view via top
command and it did not run out of memory. It just keeps failing with those errors.
Does your CPU support AVX instructions?
@marcelklehr it would seem so:
root@whisper:~# grep avx /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
Hi @hanserasmus, can you try running the whisper binary shipped with the app directly?
Go into the nextcloud directory where index.php
is located and then do the following:
cd apps/stt_whisper
./bin/main -m ./models/small -t $(nproc) -l auto --no-timestamps <path to a 16kHz wav file>
(replace small
with whatever size of model you have downloaded like medium
)You can convert audio to a 16kHz wav file using this command:
ffmpeg -i <input_audio_file> -ar 16000 -ac 1 -c:a pcm_s16le -threads $(nproc) <output_file.wav>
Hi @kyteinsky thank you for your reply.
root@whisper:/var/www/nextcloud/apps/stt_whisper# ./bin/main -m ./models/medium -t $(nproc) -l auto --no-timestamps /root/File1-16kHz.wav
whisper_init_from_file_no_state: loading model from './models/medium'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 4
whisper_model_load: mem required = 1725.00 MB (+ 43.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 1462.35 MB
Illegal instruction
@hanserasmus Looks like you're running some other architecture than x86_64.
Can you post the output of uname -a
please?
@kyteinsky
root@whisper:/var/www/nextcloud/apps/stt_whisper# uname -a
Linux whisper 5.15.131-2-pve #1 SMP PVE 5.15.131-3 (2023-12-01T13:42Z) x86_64 x86_64 x86_64 GNU/Linux
Also outputs for ldd /bin/sh
and file bin/*
@kyteinsky A thought: would it help rather than running an LXC, to rather run a VM? Maybe that way we can isolate the resources a bit more and lock it down?
Also outputs for
ldd /bin/sh
andfile bin/*
root@whisper:/var/www/nextcloud/apps/stt_whisper# ldd /bin/sh
linux-vdso.so.1 (0x00007ffcd77d8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efd1e998000)
/lib64/ld-linux-x86-64.so.2 (0x00007efd1ebec000)
root@whisper:/var/www/nextcloud/apps/stt_whisper# file bin/*
bin/main: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=83b0bb7a24202a943656facaa0eaed0779ed7192, for GNU/Linux 3.2.0, not stripped
bin/main-musl: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=83b0bb7a24202a943656facaa0eaed0779ed7192, for GNU/Linux 3.2.0, not stripped
A thought: would it help rather than running an LXC, to rather run a VM? Maybe that way we can isolate the resources a bit more and lock it down?
Not sure that would be any different in this case.
@marcelklehr Any clues here?
I think it's the AVX things, still. I've heard reports from people that proxmox doesn't pass it through. Not sure how to test that, though, you can definitely try running an LXC
Also seearching the web for proxmox avx yields some results that look related to this.
@marcelklehr I am currently running an LXC. Was thinking to rather run a VM.
So a question, not being a troll. But what makes this bin file different to the whisper bin I installed via pip3? Because that runs on this machine. Would the pip3 whisper not also error out without avx support?
Can you give a link for the whisper bin that runs for you? It could be we are using an older version of whisper.cpp
root@whisper:/var/www/nextcloud/apps/stt_whisper# pip3 install whisper
Collecting whisper
Downloading whisper-1.1.10.tar.gz (42 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 KB 819.0 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from whisper) (1.16.0)
Building wheels for collected packages: whisper
Building wheel for whisper (setup.py) ... done
Created wheel for whisper: filename=whisper-1.1.10-py3-none-any.whl size=41138 sha256=a8f36353f2fe8fa989a1578c1fb5fad479fd7724ff66162e343002fe95311282
Stored in directory: /root/.cache/pip/wheels/aa/7c/1d/015619716e2facae6631312503baf3c3220e6a9a3508cb14b6
Successfully built whisper
Installing collected packages: whisper
Successfully installed whisper-1.1.10
This is how I installed it, does that give you enough info?
This appears to be the Whisper time-series database library which is unrelated to whisper.cpp, which is used in this project.
@marcelklehr I am sorry, I am an idiot.
That is not the command I used. Checked my history, and well here you go: (from https://github.com/openai/whisper#setup)
root@whisper:/var/www/nextcloud/apps/stt_whisper# pip3 install openai-whisper
Requirement already satisfied: openai-whisper in /usr/local/lib/python3.10/dist-packages (20231117)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (1.26.3)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (2.1.2)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (10.1.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (4.66.1)
Requirement already satisfied: triton<3,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (2.1.0)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (0.58.1)
Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (0.5.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from triton<3,>=2.0.0->openai-whisper) (3.13.1)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->openai-whisper) (0.41.1)
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken->openai-whisper) (2023.12.25)
Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from tiktoken->openai-whisper) (2.31.0)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (8.9.2.26)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (11.4.5.107)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (1.12)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (3.1.2)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (10.3.2.106)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (2.18.1)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.0.106)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (2023.12.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (3.2.1)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.3.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (4.9.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->openai-whisper) (12.3.101)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (2023.11.17)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (2.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->openai-whisper) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->openai-whisper) (1.3.0)
@hanserasmus OpenAI's whisper is not the same as whisper.cpp.
Just to clear out any suspicions about the binary shipped, can you try to compile the whisper binary locally?
Inside the apps/stt_whisper/
directory, download this Makefile and run make bin/main
. Also ensure that the web server user can access the binary.
Does the transcription work after this?
@kyteinsky this seems to be the money shot! Here is the output of my make
command, maybe you can see why a compile from source worked when an install from app store did not?
root@whisper:/var/www/nextcloud/apps/stt_whisper# make bin/main
git clone https://github.com/ggerganov/whisper.cpp.git
Cloning into 'whisper.cpp'...
remote: Enumerating objects: 6259, done.
remote: Counting objects: 100% (6259/6259), done.
remote: Compressing objects: 100% (2020/2020), done.
remote: Total 6259 (delta 3995), reused 6190 (delta 3972), pack-reused 0
Receiving objects: 100% (6259/6259), 9.72 MiB | 16.29 MiB/s, done.
Resolving deltas: 100% (3995/3995), done.
cd whisper.cpp && make clean && make
make[1]: Entering directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I LDFLAGS:
I CC: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX: g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
rm -f *.o main stream command talk talk-llama bench quantize server lsp libwhisper.a libwhisper.so
make[1]: Leaving directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
make[1]: Entering directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I LDFLAGS:
I CC: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX: g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 -c ggml.c -o ggml.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 -c ggml-alloc.c -o ggml-alloc.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 -c ggml-backend.c -o ggml-backend.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 -c ggml-quants.c -o ggml-quants.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main
./main -h
usage: ./main [options] file0.wav file1.wav ...
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [5 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-otxt, --output-txt [false ] output result in a text file
-ovtt, --output-vtt [false ] output result in a vtt file
-osrt, --output-srt [false ] output result in a srt file
-olrc, --output-lrc [false ] output result in a lrc file
-owts, --output-words [false ] output script for generating karaoke video
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-np, --no-prints [false ] do not print anything other than the results
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] input WAV file path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-ls, --log-score [false ] log best decoder scores of tokens
-ng, --no-gpu [false ] disable GPU
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/bench/bench.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/server/server.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o server
make[1]: Leaving directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
cp whisper.cpp/main bin/main
After that I ran the command from before again, and so far I see this, which is good:
root@whisper:/var/www/nextcloud/apps/stt_whisper# ./bin/main -m ./models/medium -t $(nproc) -l auto --no-timestamps /root/File1-16kHz.wav
whisper_init_from_file_with_params_no_state: loading model from './models/medium'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU buffer size = 1533.52 MB
whisper_model_load: model size = 1533.14 MB
whisper_init_state: kv self size = 132.12 MB
whisper_init_state: kv cross size = 147.46 MB
whisper_init_state: compute buffer (conv) = 25.61 MB
whisper_init_state: compute buffer (encode) = 170.28 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 98.32 MB
system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |
main: processing '/root/File1-16kHz.wav' (1409245 samples, 88.1 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 0 ...
whisper_full_with_state: auto-detected language: en (p = 0.989272)
For interest sake I will report back on the two transcriptions in terms of accuracy between this and openAI's whisper if you are interested?
Yup that is good, which means something is off with the binary. Maybe static linking is the answer to every question. We'll see. Thanks for sticking around and helping narrow down the problem :)
We still aren't sure "exactly" what caused it, but we have a direction now.
For interest sake I will report back on the two transcriptions in terms of accuracy between this and openAI's whisper if you are interested?
They just have different driver codes (python based and cpp based) but use the same weights so it wouldn't surprise me if both have almost the same accuracy.
No worries. Thanks for sticking with me on this. Running a couple of tests, it is taking WAY longer than expected. Will report back once I have the results.
@kyteinsky @marcelklehr Tests concluded now.
Using the whisper.cpp binary I compiled, and using the same 16kHz audio file created earlier in this thread, the results are as follows:
whisper.cpp:
Command: root@whisper:/var/www/nextcloud/apps/stt_whisper# time ./bin/main -m ./models/medium -t $(nproc) -l auto --no-timestamps /root/File1-16kHz.wav
FFMPEG output:
ffmpeg -i /root/File1-16kHz.wav
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '/root/File1-16kHz.wav':
Metadata:
encoder : Lavf58.76.100
Duration: 00:01:28.08, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Time taken:
real 42m8.239s
user 165m27.897s
sys 1m4.406s
openai whisper binary:
Command: root@whisper:/var/www/nextcloud/apps/stt_whisper# time whisper --model medium --model_dir /opt/whisper/models/ --threads $(nproc) /root/File1-16kHz.wav
FFMPEG output: Same as above
Time taken:
real 12m45.020s
user 33m22.922s
sys 6m41.977s
This time includes a ~1min8sec duration for downloading the medium
model from OpenAI's repo.
The resulting texts were identical. So in terms of accuracy the tests conclude that the two binaries are equal. But in terms of time, I am afraid whisper.cpp is awful compared to openai's whisper.
I am not bashing your app or your efforts here, not at all. I am merely stating some facts I have gathered whilst comparing the two packages on the same vm/container.
42min of runtime for 1.5min of audio is indeed abysmal. The rub usually lies in the hardware. You may be able to tweak your whisper.cpp binary for an optimal runtime. We cannot ship binaries that are optimal for all hardware, so we ship something which works ok on most hardware. If you want optimal speeds, you can compile on your own and tweak the compilation step. There are also projects which are even faster than whisper.cpp, that we currently don't support. As to why our shipped binary didn't work on your machine: I suspect it was compiled to use an instruction that your cpu is missing.
Thank you for the response.
Hi all
Trying to get whisper and stt going for a POC. I have managed to set up whisper successfully on the POC, and added this app via the app store, but when I try and schedule a file for transcription from the smart picker, I get the following error in the log:
and then the following two lines as well:
I have no idea what this about? Any thoughts maybe?
TIA for any help!