sskorol / vosk-api-gpu

Vosk ASR Docker images with GPU for Jetson boards, PCs, M1 laptops and GPC
Apache License 2.0
38 stars 7 forks source link

OSError: Multiple exceptions #13

Closed raghavendrajain closed 2 years ago

raghavendrajain commented 2 years ago

All the instructions were executed successfully but when I tried running the code, the following error occurred. What should I do?

Traceback (most recent call last): File "./test.py", line 28, in <module> run_test('ws://localhost:2700')) File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete return future.result() File "./test.py", line 10, in run_test async with websockets.connect(uri) as websocket: File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 633, in __aenter__ return await self File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 650, in __await_impl_timeout__ return await asyncio.wait_for(self.__await_impl__(), self.open_timeout) File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 442, in wait_for return fut.result() File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 654, in __await_impl__ transport, protocol = await self._create_connection() File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 971, in create_connection ', '.join(str(exc) for exc in exceptions))) OSError: Multiple exceptions: [Errno 111] Connect call failed ('::1', 2700, 0, 0), [Errno 111] Connect call failed ('127.0.0.1', 2700)

sskorol commented 2 years ago

The exception says it can't connect to a specified WS address/port (Vosk server). You should check if a container has been started. If yes, check its logs for errors.

raghavendrajain commented 2 years ago

Hey man, I do not know how to do that. Apologies for putting burden on you, buy if you can give me some commands, I can run them.

sskorol commented 2 years ago

After running docker-compose, execute the following:

If for some reason you don't see running containers, start docker-compose w/o -d flag, and paste logs.

raghavendrajain commented 2 years ago

docker ps gives

CONTAINER ID   IMAGE                           COMMAND                  CREATED       STATUS         PORTS                    NAMES
7d5b8b444304   sskorol/vosk-server:0.3.33-pc   "python3 ./asr_serve…"   4 hours ago   Up 8 seconds   0.0.0.0:2700->2700/tcp   vosk-api-gpu_vosk_1
f8ec14ef0b6c   gcr.io/inverting-proxy/agent    "/bin/sh -c '/opt/bi…"   4 hours ago   Up 4 hours                              proxy-agent

The logs give

WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fadb0882d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fadb0382a9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fadb087780b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fadb04201f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fadb041f0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fadb37dcff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fadb37dc40a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fadb3802fd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fadb449e0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fb595ebed0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fb5959bea9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fb595eb380b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fb595a5c1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fb595a5b0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fb598e18ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fb598e1840a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fb598e3efd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb599ada0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fce6ad9fd0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fce6a89fa9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fce6ad9480b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fce6a93d1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fce6a93c0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fce6dcf9ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fce6dcf940a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fce6dd1ffd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fce6e9bb0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
sskorol commented 2 years ago

Please wrap logs with a code block as it's hard to read. But anyway, I can see the error with your model:

Invalid option --min-active=200 in config file model/conf/model.conf

Which model you use? As far as I know, not all the models were adopted for the recent GPU updates.

sskorol commented 2 years ago

Anyway, you can try the following hotfix for your model:

However, I'm not really sure which options would work the best for your concrete model. So you can try to experiment with ivector.conf params in case of any issues.

raghavendrajain commented 2 years ago

Please wrap logs with a code block as it's hard to read. But anyway, I can see the error with your model:

Invalid option --min-active=200 in config file model/conf/model.conf

Which model you use? As far as I know, not all the models were adopted for the recent GPU updates.

I used this model https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip

sskorol commented 2 years ago

Try a big model. I don't believe small models are adopted to be used with GPU.

raghavendrajain commented 2 years ago

Anyway, you can try the following hotfix for your model:

  • remove min-active parameter from model/conf/model.conf
  • replace ivector.conf with the same file from the already adopted model e.g. EN.

However, I'm not really sure which options would work the best for your concrete model. So you can try to experiment with ivector.conf params in case of any issues.

There is no ivector.conf .

raghavendrajain commented 2 years ago

Try a big model. I don't believe small models are adopted to be used with GPU.

OK.

sskorol commented 2 years ago

And if you use a big EN model, I believe no changes are required. Just try to restart the container with this big model.

raghavendrajain commented 2 years ago

Great, I used the bigger model given here https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip. Unzipped in my home directory, renamed it to ./model

Ran the following command ./test.py weather.wav

But got the error.

{ "partial" : "" }

Traceback (most recent call last):
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/protocol.py", line 945, in transfer_data
    message = await self.read_message()
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/protocol.py", line 1015, in read_message
    frame = await self.read_data_frame(max_size=self.max_size)
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/protocol.py", line 1090, in read_data_frame
    frame = await self.read_frame(max_size)
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/protocol.py", line 1149, in read_frame
    extensions=self.extensions,
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/framing.py", line 70, in read
    data = await reader(2)
  File "/opt/conda/lib/python3.7/asyncio/streams.py", line 677, in readexactly
    raise IncompleteReadError(incomplete, n)
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./test.py", line 28, in <module>
    run_test('ws://localhost:2700'))
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "./test.py", line 21, in run_test
    print(await websocket.recv())
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/protocol.py", line 553, in recv
    await self.ensure_open()
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/protocol.py", line 921, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: no close frame received or sent
sskorol commented 2 years ago

weather.wav is for a different language. Provide your own recording in EN to test. But note, it should be 16kHz, mono.

raghavendrajain commented 2 years ago

weather.wav is for a different language. Provide your own recording in EN to test. But note, it should be 16kHz, mono.

I did make a recording in EN. Gives the same error,

Traceback (most recent call last):
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 135, in read_http_response
    status_code, reason, headers = await read_response(self.reader)
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/http.py", line 120, in read_response
    status_line = await read_line(stream)
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/http.py", line 194, in read_line
    line = await stream.readline()
  File "/opt/conda/lib/python3.7/asyncio/streams.py", line 496, in readline
    line = await self.readuntil(sep)
  File "/opt/conda/lib/python3.7/asyncio/streams.py", line 588, in readuntil
    await self._wait_for_data('readuntil')
  File "/opt/conda/lib/python3.7/asyncio/streams.py", line 473, in _wait_for_data
    await self._waiter
  File "/opt/conda/lib/python3.7/asyncio/selector_events.py", line 814, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./test.py", line 28, in <module>
    run_test('ws://localhost:2700'))
  File "/opt/conda/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "./test.py", line 10, in run_test
    async with websockets.connect(uri) as websocket:
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 633, in __aenter__
    return await self
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 650, in __await_impl_timeout__
    return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
  File "/opt/conda/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
    return fut.result()
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 663, in __await_impl__
    extra_headers=protocol.extra_headers,
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 322, in handshake
    status_code, response_headers = await self.read_http_response()
  File "/home/raghavendra.jain/vosk-api-gpu/.venv/lib/python3.7/site-packages/websockets/legacy/client.py", line 141, in read_http_response
    raise InvalidMessage("did not receive a valid HTTP response") from exc
websockets.exceptions.InvalidMessage: did not receive a valid HTTP response
sskorol commented 2 years ago

Docker logs?

raghavendrajain commented 2 years ago
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fadb0882d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fadb0382a9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fadb087780b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fadb04201f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fadb041f0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fadb37dcff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fadb37dc40a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fadb3802fd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fadb449e0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fb595ebed0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fb5959bea9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fb595eb380b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fb595a5c1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fb595a5b0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fb598e18ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fb598e1840a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fb598e3efd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb599ada0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fce6ad9fd0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fce6a89fa9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fce6ad9480b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fce6a93d1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fce6a93c0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fce6dcf9ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fce6dcf940a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fce6dd1ffd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fce6e9bb0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7f2634e07d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7f2634907a9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7f2634dfc80b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7f26349a51f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7f26349a40f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7f2637d61ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7f2637d6140a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7f2637d87fd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2638a230b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
INFO:root:Connection 0 from ('172.18.0.1', 40140)
ERROR ([5.5.1013~1546-9b851]:AddMatMat():cu-matrix.cc:1325) cublasStatus_t 8 : "CUBLAS_STATUS_ARCH_MISMATCH" returned from 'cublas_gemm(GetCublasHandle(), (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), m, n, k, alpha, B.data_, B.Stride(), A.data_, A.Stride(), beta, data_, Stride())'

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7f1a2b946d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(+0x696673) [0x7f1a2b7ff673]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0x47c) [0x7f1a2b81361c]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFinalFeaturesBatched(kaldi::LaneDesc const*, int, float, kaldi::CuMatrix<float>*, kaldi::CuMatrix<float>*)+0x37a) [0x7f1a2b521f3a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFeaturesBatched(kaldi::LaneDesc const*, int, kaldi::CuMatrixBase<float> const&, float, float, kaldi::CuMatrix<float>*)+0xa4) [0x7f1a2b522754]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::OnlineBatchedFeaturePipelineCuda::ComputeFeaturesBatched(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, float, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*, std::vector<int, std::allocator<int> >*)+0x29e) [0x7f1a2b51d66e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::ComputeGPUFeatureExtraction(std::vector<int, std::allocator<int> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0xe6) [0x7f1a2b4ed116]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::DecodeBatch(std::vector<unsigned long, std::allocator<unsigned long> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*> >*, std::vector<bool, std::allocator<bool> >*)+0xf7) [0x7f1a2b4f1277]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::CudaOnlinePipelineDynamicBatcher::BatcherThreadLoop()+0x25d) [0x7f1a2b4fe59d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6d84) [0x7f19f6b8bd84]
/usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f1a2f521609]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f1a2f65d293]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
INFO:root:Connection 0 from ('172.18.0.1', 40400)
ERROR ([5.5.1013~1546-9b851]:AddMatMat():cu-matrix.cc:1325) cublasStatus_t 8 : "CUBLAS_STATUS_ARCH_MISMATCH" returned from 'cublas_gemm(GetCublasHandle(), (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), m, n, k, alpha, B.data_, B.Stride(), A.data_, A.Stride(), beta, data_, Stride())'

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7f2354154d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(+0x696673) [0x7f235400d673]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0x47c) [0x7f235402161c]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFinalFeaturesBatched(kaldi::LaneDesc const*, int, float, kaldi::CuMatrix<float>*, kaldi::CuMatrix<float>*)+0x37a) [0x7f2353d2ff3a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFeaturesBatched(kaldi::LaneDesc const*, int, kaldi::CuMatrixBase<float> const&, float, float, kaldi::CuMatrix<float>*)+0xa4) [0x7f2353d30754]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::OnlineBatchedFeaturePipelineCuda::ComputeFeaturesBatched(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, float, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*, std::vector<int, std::allocator<int> >*)+0x29e) [0x7f2353d2b66e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::ComputeGPUFeatureExtraction(std::vector<int, std::allocator<int> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0xe6) [0x7f2353cfb116]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::DecodeBatch(std::vector<unsigned long, std::allocator<unsigned long> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*> >*, std::vector<bool, std::allocator<bool> >*)+0xf7) [0x7f2353cff277]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::CudaOnlinePipelineDynamicBatcher::BatcherThreadLoop()+0x25d) [0x7f2353d0c59d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6d84) [0x7f231f399d84]
/usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f2357d2f609]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f2357e6b293]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
raghavendrajain commented 2 years ago

hey man, I used the code tags, it did not wrap the large log. I am sorry for this mess.

sskorol commented 2 years ago

Is it a fresh log or a composition of multiple runs? As I still see errors related to min-active param. Is it a new or old one? I can also see CUBLAS_STATUS_ARCH_MISMATCH. Which CUDA version is on your machine? And which one you've used to build a container? Also wondering about your arch?

sskorol commented 2 years ago

@raghavendrajain I tried K80 instance and it seems it has an outdated arch. At least I can see the same issue in a vosk-api repo. I switched to P4 and it works as expected. So you may want to try another nvidia instance.

raghavendrajain commented 2 years ago

@raghavendrajain I tried K80 instance and it seems it has an outdated arch. At least I can see the same issue in a vosk-api repo. I switched to P4 and it works as expected. So you may want to try another nvidia instance.

Oh great, thank you! I will try that right away, using docker and without it! lemme update here in an hour or so.

raghavendrajain commented 2 years ago

@raghavendrajain I tried K80 instance and it seems it has an outdated arch. At least I can see the same issue in a vosk-api repo. I switched to P4 and it works as expected. So you may want to try another nvidia instance.

Hey man, it works! Thank you so much. I could run the large EN model as well as the small EN model. However, the Japanese model would give transcription in EN (ofcourse meaningless). Is it possible to use the Japanese model via GPU?

sskorol commented 2 years ago

Don't know. It's better to ask Vosk owner in their repo.

raghavendrajain commented 2 years ago

Don't know. It's better to ask Vosk owner in their repo.

The owner had just prepared the large Japanese model for GPU and kindly sent me via PM.