nextcloud / stt_whisper

Speech-To-Text provider running OpenAI Whisper locally
23 stars 0 forks source link

Transcription failed with: Whisper process failed, exception: {} #28

Closed hanserasmus closed 9 months ago

hanserasmus commented 10 months ago

Hi all

Trying to get whisper and stt going for a POC. I have managed to set up whisper successfully on the POC, and added this app via the app store, but when I try and schedule a file for transcription from the smart picker, I get the following error in the log:

{"reqId":"AXizQk0ZAScmiuYDexKt","level":2,"time":"2024-01-04T09:05:04+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Transcription failed with: Whisper process failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","line":30,"function":"transcribe","class":"OCA\\SttWhisper\\Service\\SpeechToTextService","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Service/SpeechToTextService.php","Line":88,"message":"Transcription failed with: Whisper process failed","exception":{},"CustomMessage":"Transcription failed with: Whisper process failed"}}

and then the following two lines as well:

{"reqId":"AXizQk0ZAScmiuYDexKt","level":2,"time":"2024-01-04T09:05:04+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Transcription of file 175 failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Could not transcribe file","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","Line":135,"message":"Transcription of file 175 failed","exception":{},"CustomMessage":"Transcription of file 175 failed"}}
{"reqId":"AXizQk0ZAScmiuYDexKt","level":3,"time":"2024-01-04T09:05:04+00:00","remoteAddr":"","user":"--","app":"stt_helper","method":"","url":"--","message":"Transcript generation failed: Could not transcribe file","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_helper"}}

I have no idea what this about? Any thoughts maybe?

TIA for any help!

kyteinsky commented 10 months ago

Hi @hanserasmus, are you using a musl-based system?
The logging would be fixed in the next release of the app, so for now can you turn on debug logging in NC (log level), try to transcribe the file and post the log here?
Thanks.

hanserasmus commented 10 months ago

@kyteinsky Thank you for the response. I am too stupid to know what a musl-based system is, and am confident enough to admit I am too stupid to know :-) So the answer is, I don't know.

It's an LXC container based on Ubuntu 22.04 built on Promox 7.4. The same file can be transcribed successfully using the whisper command line, I have tested that part at least. Not sure if I am perhaps missing some PHP plugins? This is a very minimal setup of NC to test with, so I have only done the bare minimum to get it up and running. Will add debug logs very soon.

hanserasmus commented 10 months ago

@kyteinsky Here is the debug logs:

{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:02+00:00","remoteAddr":"","user":"--","app":"cron","method":"","url":"--","message":"Run OC\\SpeechToText\\TranscriptionJob job with ID 53","userAgent":"--","version":"28.0.1.1","data":{"app":"cron"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:02+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n  0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../node_modules/ffmpeg-static/ffmpeg',\n  1 => '-i',\n  2 => '/var/www/nextcloud/data/admin/files/File1.ogg',\n  3 => '-ar',\n  4 => 16000,\n  5 => '-ac',\n  6 => 1,\n  7 => '-af',\n  8 => 'silenceremove=window=1:detection=peak:stop_periods=-1:stop_silence=7:start_threshold=-70dB:stop_threshold=-70dB',\n  9 => '-c:a',\n  10 => 'pcm_s16le',\n  11 => '-threads',\n  12 => 4,\n  13 => '-y',\n  14 => '/tmp/oc_tmp_3uDeAB-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:03+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n  0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../bin/main',\n  1 => '-m',\n  2 => '../../models/medium',\n  3 => '-t',\n  4 => 4,\n  5 => '-l',\n  6 => 'auto',\n  7 => '--no-timestamps',\n  8 => '/tmp/oc_tmp_3uDeAB-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":2,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"whisper_init_from_file_no_state: loading model from '../../models/medium'\nwhisper_model_load: loading model\nwhisper_model_load: n_vocab       = 51865\nwhisper_model_load: n_audio_ctx   = 1500\nwhisper_model_load: n_audio_state = 1024\nwhisper_model_load: n_audio_head  = 16\nwhisper_model_load: n_audio_layer = 24\nwhisper_model_load: n_text_ctx    = 448\nwhisper_model_load: n_text_state  = 1024\nwhisper_model_load: n_text_head   = 16\nwhisper_model_load: n_text_layer  = 24\nwhisper_model_load: n_mels        = 80\nwhisper_model_load: f16           = 1\nwhisper_model_load: type          = 4\nwhisper_model_load: mem required  = 1725.00 MB (+   43.00 MB per decoder)\nwhisper_model_load: adding 1608 extra tokens\nwhisper_model_load: model ctx     = 1462.35 MB\n","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":2,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Transcription failed with: Whisper process failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","line":30,"function":"transcribe","class":"OCA\\SttWhisper\\Service\\SpeechToTextService","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Service/SpeechToTextService.php","Line":88,"message":"Transcription failed with: Whisper process failed","exception":{},"CustomMessage":"Transcription failed with: Whisper process failed"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":1,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Transcription failed with: Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","Line":33,"message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","exception":{},"CustomMessage":"SpeechToText transcription using provider Whisper Speech-To-Text failed"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":2,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Transcription of file 175 failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Could not transcribe file","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","Line":135,"message":"Transcription of file 175 failed","exception":{},"CustomMessage":"Transcription of file 175 failed"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":3,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"stt_helper","method":"","url":"--","message":"Transcript generation failed: Could not transcribe file","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_helper"}}

{"reqId":"77JjNulYpBZw0cGsI1R2","level":0,"time":"2024-01-04T13:55:04+00:00","remoteAddr":"","user":"--","app":"cron","method":"","url":"--","message":"Finished OC\\SpeechToText\\TranscriptionJob job with ID 53 in 2 seconds","userAgent":"--","version":"28.0.1.1","data":{"app":"cron"}}

I should add that my whisper was installed via pip3, but it is available on the cli as simply whisper (if that helps?).

kyteinsky commented 10 months ago

Since you're using a off-the-shelf Ubuntu vm, it is not musl based. You can check it with ldd /bin/sh, if it writes musl somewhere, it is musl. Now you know :)

This app comes bundled with a whisper binary so your local install would not be used.

The logs have been truncated by NC logger at the end but it seems that the model file is missing most probably. The desired model file can be downloaded by this occ command: occ stt_whisper:download-models [model-name], where model-name can be small, medium or large.
We can try running the binary manually if this does not help to get the complete output of the whisper command.

hanserasmus commented 10 months ago

HiWill check the musl later. I did load the medium model as per installation instructions right during the install. Is musl an issue?Sent from my iPhoneOn 04 Jan 2024, at 16:33, Anupam Kumar @.***> wrote: Since you're using a off-the-shelf Ubuntu vm, it is not musl based. You can check it with ldd /bin/sh, if it writes musl somewhere, it is musl. Now you know :) This app comes bundled with a whisper binary so your local install would not be used. The logs have been truncated by NC logger at the end but it seems that the model file is missing most probably. The desired model file can be downloaded by this occ command: occ stt_whisper:download-models [model-name], where model-name can be small, medium or large.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

hanserasmus commented 10 months ago

@kyteinsky I ensured that the medium model is installed, it was, but just did it anyways. Then I scheduled the transcription again, and it failed again, with these log entries:

{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":0,"time":"2024-01-04T21:15:01+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n  0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../node_modules/ffmpeg-static/ffmpeg',\n  1 => '-i',\n  2 => '/var/www/nextcloud/data/admin/files/File1.ogg',\n  3 => '-ar',\n  4 => 16000,\n  5 => '-ac',\n  6 => 1,\n  7 => '-af',\n  8 => 'silenceremove=window=1:detection=peak:stop_periods=-1:stop_silence=7:start_threshold=-70dB:stop_threshold=-70dB',\n  9 => '-c:a',\n  10 => 'pcm_s16le',\n  11 => '-threads',\n  12 => 4,\n  13 => '-y',\n  14 => '/tmp/oc_tmp_MZWOP0-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":0,"time":"2024-01-04T21:15:02+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Running array (\n  0 => '/var/www/nextcloud/apps/stt_whisper/lib/Service/../../bin/main',\n  1 => '-m',\n  2 => '../../models/medium',\n  3 => '-t',\n  4 => 4,\n  5 => '-l',\n  6 => 'auto',\n  7 => '--no-timestamps',\n  8 => '/tmp/oc_tmp_MZWOP0-.wav',\n)","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":2,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"whisper_init_from_file_no_state: loading model from '../../models/medium'\nwhisper_model_load: loading model\nwhisper_model_load: n_vocab       = 51865\nwhisper_model_load: n_audio_ctx   = 1500\nwhisper_model_load: n_audio_state = 1024\nwhisper_model_load: n_audio_head  = 16\nwhisper_model_load: n_audio_layer = 24\nwhisper_model_load: n_text_ctx    = 448\nwhisper_model_load: n_text_state  = 1024\nwhisper_model_load: n_text_head   = 16\nwhisper_model_load: n_text_layer  = 24\nwhisper_model_load: n_mels        = 80\nwhisper_model_load: f16           = 1\nwhisper_model_load: type          = 4\nwhisper_model_load: mem required  = 1725.00 MB (+   43.00 MB per decoder)\nwhisper_model_load: adding 1608 extra tokens\nwhisper_model_load: model ctx     = 1462.35 MB\n","userAgent":"--","version":"28.0.1.1","data":{"app":"stt_whisper"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":2,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"stt_whisper","method":"","url":"--","message":"Transcription failed with: Whisper process failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","line":30,"function":"transcribe","class":"OCA\\SttWhisper\\Service\\SpeechToTextService","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Service/SpeechToTextService.php","Line":88,"message":"Transcription failed with: Whisper process failed","exception":{},"CustomMessage":"Transcription failed with: Whisper process failed"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":1,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Transcription failed with: Whisper process failed","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","line":129,"function":"transcribeFile","class":"OCA\\SttWhisper\\Provider\\SpeechToText","type":"->"},{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/apps/stt_whisper/lib/Provider/SpeechToText.php","Line":33,"message":"SpeechToText transcription using provider Whisper Speech-To-Text failed","exception":{},"CustomMessage":"SpeechToText transcription using provider Whisper Speech-To-Text failed"}}
{"reqId":"2jJHK5T4X8kFWkp8BHxQ","level":2,"time":"2024-01-04T21:15:03+00:00","remoteAddr":"","user":"--","app":"no app in context","method":"","url":"--","message":"Transcription of file 175 failed","userAgent":"--","version":"28.0.1.1","exception":{"Exception":"RuntimeException","Message":"Could not transcribe file","Code":0,"Trace":[{"file":"/var/www/nextcloud/lib/private/SpeechToText/TranscriptionJob.php","line":82,"function":"transcribeFile","class":"OC\\SpeechToText\\SpeechToTextManager","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/Job.php","line":81,"function":"run","class":"OC\\SpeechToText\\TranscriptionJob","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":57,"function":"start","class":"OCP\\BackgroundJob\\Job","type":"->"},{"file":"/var/www/nextcloud/lib/public/BackgroundJob/QueuedJob.php","line":47,"function":"start","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"},{"file":"/var/www/nextcloud/cron.php","line":152,"function":"execute","class":"OCP\\BackgroundJob\\QueuedJob","type":"->"}],"File":"/var/www/nextcloud/lib/private/SpeechToText/SpeechToTextManager.php","Line":135,"message":"Transcription of file 175 failed","exception":{},"CustomMessage":"Transcription of file 175 failed"}}
marcelklehr commented 10 months ago

Could it be that you are lacking enough RAM and the process is killed because of OOM?

hanserasmus commented 10 months ago

There is no log entries in /var/log/syslog about OOM killer, I kept a realtime view via top command and it did not run out of memory. It just keeps failing with those errors.

marcelklehr commented 10 months ago

Does your CPU support AVX instructions?

hanserasmus commented 10 months ago

@marcelklehr it would seem so:

root@whisper:~# grep avx /proc/cpuinfo 
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
kyteinsky commented 10 months ago

Hi @hanserasmus, can you try running the whisper binary shipped with the app directly?

Go into the nextcloud directory where index.php is located and then do the following:

  1. cd apps/stt_whisper
  2. ./bin/main -m ./models/small -t $(nproc) -l auto --no-timestamps <path to a 16kHz wav file> (replace small with whatever size of model you have downloaded like medium)

You can convert audio to a 16kHz wav file using this command:

ffmpeg -i <input_audio_file> -ar 16000 -ac 1 -c:a pcm_s16le -threads $(nproc) <output_file.wav>
hanserasmus commented 10 months ago

Hi @kyteinsky thank you for your reply.

root@whisper:/var/www/nextcloud/apps/stt_whisper# ./bin/main -m ./models/medium -t $(nproc) -l auto --no-timestamps /root/File1-16kHz.wav 
whisper_init_from_file_no_state: loading model from './models/medium'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 4
whisper_model_load: mem required  = 1725.00 MB (+   43.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 1462.35 MB
Illegal instruction
kyteinsky commented 10 months ago

@hanserasmus Looks like you're running some other architecture than x86_64.
Can you post the output of uname -a please?

hanserasmus commented 10 months ago

@kyteinsky

root@whisper:/var/www/nextcloud/apps/stt_whisper# uname -a
Linux whisper 5.15.131-2-pve #1 SMP PVE 5.15.131-3 (2023-12-01T13:42Z) x86_64 x86_64 x86_64 GNU/Linux
kyteinsky commented 10 months ago

Also outputs for ldd /bin/sh and file bin/*

hanserasmus commented 10 months ago

@kyteinsky A thought: would it help rather than running an LXC, to rather run a VM? Maybe that way we can isolate the resources a bit more and lock it down?

hanserasmus commented 10 months ago

Also outputs for ldd /bin/sh and file bin/*

root@whisper:/var/www/nextcloud/apps/stt_whisper# ldd /bin/sh
    linux-vdso.so.1 (0x00007ffcd77d8000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efd1e998000)
    /lib64/ld-linux-x86-64.so.2 (0x00007efd1ebec000)
root@whisper:/var/www/nextcloud/apps/stt_whisper# file bin/*
bin/main:      ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=83b0bb7a24202a943656facaa0eaed0779ed7192, for GNU/Linux 3.2.0, not stripped
bin/main-musl: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=83b0bb7a24202a943656facaa0eaed0779ed7192, for GNU/Linux 3.2.0, not stripped
kyteinsky commented 10 months ago

A thought: would it help rather than running an LXC, to rather run a VM? Maybe that way we can isolate the resources a bit more and lock it down?

Not sure that would be any different in this case.

@marcelklehr Any clues here?

marcelklehr commented 10 months ago

I think it's the AVX things, still. I've heard reports from people that proxmox doesn't pass it through. Not sure how to test that, though, you can definitely try running an LXC

marcelklehr commented 10 months ago

Also seearching the web for proxmox avx yields some results that look related to this.

hanserasmus commented 10 months ago

@marcelklehr I am currently running an LXC. Was thinking to rather run a VM.

So a question, not being a troll. But what makes this bin file different to the whisper bin I installed via pip3? Because that runs on this machine. Would the pip3 whisper not also error out without avx support?

marcelklehr commented 10 months ago

Can you give a link for the whisper bin that runs for you? It could be we are using an older version of whisper.cpp

hanserasmus commented 10 months ago
root@whisper:/var/www/nextcloud/apps/stt_whisper# pip3 install whisper
Collecting whisper
  Downloading whisper-1.1.10.tar.gz (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 KB 819.0 kB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from whisper) (1.16.0)
Building wheels for collected packages: whisper
  Building wheel for whisper (setup.py) ... done
  Created wheel for whisper: filename=whisper-1.1.10-py3-none-any.whl size=41138 sha256=a8f36353f2fe8fa989a1578c1fb5fad479fd7724ff66162e343002fe95311282
  Stored in directory: /root/.cache/pip/wheels/aa/7c/1d/015619716e2facae6631312503baf3c3220e6a9a3508cb14b6
Successfully built whisper
Installing collected packages: whisper
Successfully installed whisper-1.1.10

This is how I installed it, does that give you enough info?

marcelklehr commented 10 months ago

This appears to be the Whisper time-series database library which is unrelated to whisper.cpp, which is used in this project.

hanserasmus commented 10 months ago

@marcelklehr I am sorry, I am an idiot.

That is not the command I used. Checked my history, and well here you go: (from https://github.com/openai/whisper#setup)

root@whisper:/var/www/nextcloud/apps/stt_whisper# pip3 install openai-whisper
Requirement already satisfied: openai-whisper in /usr/local/lib/python3.10/dist-packages (20231117)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (1.26.3)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (2.1.2)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (10.1.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (4.66.1)
Requirement already satisfied: triton<3,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (2.1.0)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (0.58.1)
Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from openai-whisper) (0.5.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from triton<3,>=2.0.0->openai-whisper) (3.13.1)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->openai-whisper) (0.41.1)
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken->openai-whisper) (2023.12.25)
Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/dist-packages (from tiktoken->openai-whisper) (2.31.0)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (8.9.2.26)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (11.4.5.107)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.105)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (1.12)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (3.1.2)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (10.3.2.106)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (2.18.1)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.0.106)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (2023.12.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (3.2.1)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (12.1.3.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->openai-whisper) (4.9.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->openai-whisper) (12.3.101)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (2023.11.17)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.26.0->tiktoken->openai-whisper) (2.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->openai-whisper) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->openai-whisper) (1.3.0)
kyteinsky commented 10 months ago

@hanserasmus OpenAI's whisper is not the same as whisper.cpp.

Just to clear out any suspicions about the binary shipped, can you try to compile the whisper binary locally? Inside the apps/stt_whisper/ directory, download this Makefile and run make bin/main. Also ensure that the web server user can access the binary.

Does the transcription work after this?

hanserasmus commented 10 months ago

@kyteinsky this seems to be the money shot! Here is the output of my make command, maybe you can see why a compile from source worked when an install from app store did not?

root@whisper:/var/www/nextcloud/apps/stt_whisper# make bin/main
git clone https://github.com/ggerganov/whisper.cpp.git
Cloning into 'whisper.cpp'...
remote: Enumerating objects: 6259, done.
remote: Counting objects: 100% (6259/6259), done.
remote: Compressing objects: 100% (2020/2020), done.
remote: Total 6259 (delta 3995), reused 6190 (delta 3972), pack-reused 0
Receiving objects: 100% (6259/6259), 9.72 MiB | 16.29 MiB/s, done.
Resolving deltas: 100% (3995/3995), done.
cd whisper.cpp && make clean && make
make[1]: Entering directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I LDFLAGS:  
I CC:       cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

rm -f *.o main stream command talk talk-llama bench quantize server lsp libwhisper.a libwhisper.so
make[1]: Leaving directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
make[1]: Entering directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3
I LDFLAGS:  
I CC:       cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3   -c ggml.c -o ggml.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3   -c ggml-alloc.c -o ggml-alloc.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3   -c ggml-backend.c -o ggml-backend.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3   -c ggml-quants.c -o ggml-quants.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main 
./main -h

usage: ./main [options] file0.wav file1.wav ...

options:
  -h,        --help              [default] show this help message and exit
  -t N,      --threads N         [4      ] number of threads to use during computation
  -p N,      --processors N      [1      ] number of processors to use during computation
  -ot N,     --offset-t N        [0      ] time offset in milliseconds
  -on N,     --offset-n N        [0      ] segment index offset
  -d  N,     --duration N        [0      ] duration of audio to process in milliseconds
  -mc N,     --max-context N     [-1     ] maximum number of text context tokens to store
  -ml N,     --max-len N         [0      ] maximum segment length in characters
  -sow,      --split-on-word     [false  ] split on word rather than on token
  -bo N,     --best-of N         [5      ] number of best candidates to keep
  -bs N,     --beam-size N       [5      ] beam size for beam search
  -wt N,     --word-thold N      [0.01   ] word timestamp probability threshold
  -et N,     --entropy-thold N   [2.40   ] entropy threshold for decoder fail
  -lpt N,    --logprob-thold N   [-1.00  ] log probability threshold for decoder fail
  -debug,    --debug-mode        [false  ] enable debug mode (eg. dump log_mel)
  -tr,       --translate         [false  ] translate from source language to english
  -di,       --diarize           [false  ] stereo audio diarization
  -tdrz,     --tinydiarize       [false  ] enable tinydiarize (requires a tdrz model)
  -nf,       --no-fallback       [false  ] do not use temperature fallback while decoding
  -otxt,     --output-txt        [false  ] output result in a text file
  -ovtt,     --output-vtt        [false  ] output result in a vtt file
  -osrt,     --output-srt        [false  ] output result in a srt file
  -olrc,     --output-lrc        [false  ] output result in a lrc file
  -owts,     --output-words      [false  ] output script for generating karaoke video
  -fp,       --font-path         [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
  -ocsv,     --output-csv        [false  ] output result in a CSV file
  -oj,       --output-json       [false  ] output result in a JSON file
  -ojf,      --output-json-full  [false  ] include more information in the JSON file
  -of FNAME, --output-file FNAME [       ] output file path (without file extension)
  -np,       --no-prints         [false  ] do not print anything other than the results
  -ps,       --print-special     [false  ] print special tokens
  -pc,       --print-colors      [false  ] print colors
  -pp,       --print-progress    [false  ] print progress
  -nt,       --no-timestamps     [false  ] do not print timestamps
  -l LANG,   --language LANG     [en     ] spoken language ('auto' for auto-detect)
  -dl,       --detect-language   [false  ] exit after automatically detecting language
             --prompt PROMPT     [       ] initial prompt
  -m FNAME,  --model FNAME       [models/ggml-base.en.bin] model path
  -f FNAME,  --file FNAME        [       ] input WAV file path
  -oved D,   --ov-e-device DNAME [CPU    ] the OpenVINO device used for encode inference
  -ls,       --log-score         [false  ] log best decoder scores of tokens
  -ng,       --no-gpu            [false  ] disable GPU

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/bench/bench.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize 
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -msse3 -mssse3 examples/server/server.cpp examples/common.cpp examples/common-ggml.cpp ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o server 
make[1]: Leaving directory '/var/www/nextcloud/apps/stt_whisper/whisper.cpp'
cp whisper.cpp/main bin/main

After that I ran the command from before again, and so far I see this, which is good:

root@whisper:/var/www/nextcloud/apps/stt_whisper# ./bin/main -m ./models/medium -t $(nproc) -l auto --no-timestamps /root/File1-16kHz.wav 
whisper_init_from_file_with_params_no_state: loading model from './models/medium'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =  1533.52 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  132.12 MB
whisper_init_state: kv cross size =  147.46 MB
whisper_init_state: compute buffer (conv)   =   25.61 MB
whisper_init_state: compute buffer (encode) =  170.28 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =   98.32 MB

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing '/root/File1-16kHz.wav' (1409245 samples, 88.1 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = auto, task = transcribe, timestamps = 0 ...

whisper_full_with_state: auto-detected language: en (p = 0.989272)

For interest sake I will report back on the two transcriptions in terms of accuracy between this and openAI's whisper if you are interested?

kyteinsky commented 10 months ago

Yup that is good, which means something is off with the binary. Maybe static linking is the answer to every question. We'll see. Thanks for sticking around and helping narrow down the problem :)
We still aren't sure "exactly" what caused it, but we have a direction now.

For interest sake I will report back on the two transcriptions in terms of accuracy between this and openAI's whisper if you are interested?

They just have different driver codes (python based and cpp based) but use the same weights so it wouldn't surprise me if both have almost the same accuracy.

hanserasmus commented 10 months ago

No worries. Thanks for sticking with me on this. Running a couple of tests, it is taking WAY longer than expected. Will report back once I have the results.

hanserasmus commented 10 months ago

@kyteinsky @marcelklehr Tests concluded now.

Using the whisper.cpp binary I compiled, and using the same 16kHz audio file created earlier in this thread, the results are as follows:

whisper.cpp: Command: root@whisper:/var/www/nextcloud/apps/stt_whisper# time ./bin/main -m ./models/medium -t $(nproc) -l auto --no-timestamps /root/File1-16kHz.wav FFMPEG output:

ffmpeg -i /root/File1-16kHz.wav 
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '/root/File1-16kHz.wav':
  Metadata:
    encoder         : Lavf58.76.100
  Duration: 00:01:28.08, bitrate: 256 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s

Time taken:

real    42m8.239s
user    165m27.897s
sys 1m4.406s

openai whisper binary:

Command: root@whisper:/var/www/nextcloud/apps/stt_whisper# time whisper --model medium --model_dir /opt/whisper/models/ --threads $(nproc) /root/File1-16kHz.wav FFMPEG output: Same as above Time taken:

real    12m45.020s
user    33m22.922s
sys 6m41.977s

This time includes a ~1min8sec duration for downloading the medium model from OpenAI's repo.

The resulting texts were identical. So in terms of accuracy the tests conclude that the two binaries are equal. But in terms of time, I am afraid whisper.cpp is awful compared to openai's whisper.

I am not bashing your app or your efforts here, not at all. I am merely stating some facts I have gathered whilst comparing the two packages on the same vm/container.

marcelklehr commented 10 months ago

42min of runtime for 1.5min of audio is indeed abysmal. The rub usually lies in the hardware. You may be able to tweak your whisper.cpp binary for an optimal runtime. We cannot ship binaries that are optimal for all hardware, so we ship something which works ok on most hardware. If you want optimal speeds, you can compile on your own and tweak the compilation step. There are also projects which are even faster than whisper.cpp, that we currently don't support. As to why our shipped binary didn't work on your machine: I suspect it was compiled to use an instruction that your cpu is missing.

hanserasmus commented 9 months ago

Thank you for the response.