OSError: Multiple exceptions

docker ps gives

CONTAINER ID   IMAGE                           COMMAND                  CREATED       STATUS         PORTS                    NAMES
7d5b8b444304   sskorol/vosk-server:0.3.33-pc   "python3 ./asr_serve…"   4 hours ago   Up 8 seconds   0.0.0.0:2700->2700/tcp   vosk-api-gpu_vosk_1
f8ec14ef0b6c   gcr.io/inverting-proxy/agent    "/bin/sh -c '/opt/bi…"   4 hours ago   Up 4 hours                              proxy-agent

The logs give

WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fadb0882d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fadb0382a9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fadb087780b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fadb04201f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fadb041f0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fadb37dcff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fadb37dc40a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fadb3802fd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fadb449e0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fb595ebed0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fb5959bea9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fb595eb380b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fb595a5c1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fb595a5b0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fb598e18ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fb598e1840a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fb598e3efd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb599ada0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fce6ad9fd0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fce6a89fa9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fce6ad9480b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fce6a93d1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fce6a93c0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fce6dcf9ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fce6dcf940a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fce6dd1ffd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fce6e9bb0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError

WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fadb0882d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fadb0382a9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fadb087780b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fadb04201f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fadb041f0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fadb37dcff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fadb37dc40a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fadb3802fd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fadb449e0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fb595ebed0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fb5959bea9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fb595eb380b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fb595a5c1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fb595a5b0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fb598e18ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fb598e1840a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fb598e3efd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fb599ada0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7fce6ad9fd0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7fce6a89fa9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7fce6ad9480b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7fce6a93d1f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7fce6a93c0f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7fce6dcf9ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7fce6dcf940a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7fce6dd1ffd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fce6e9bb0b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7

something
Options:
  --acoustic-scale            : Scaling factor for acoustic log-likelihoods (caution: is a no-op if set in the program nnet3-compute (float, default = 0.1)
  --add-pitch                 : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool, default = false)
  --aux-q-capacity            : Advanced - Capacity of the auxiliary queue : Maximum number of raw tokens that can be stored *before* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. During the tokens generation, if we detect that we are getting close to saturating that capacity, we will reduce the beam dynamically (adaptive beam) to keep only the best tokens in the remaining space. If the aux queue is still too small, we will print an overflow warning, but prevent the overflow. The computation can safely continue, but the quality of the output may decrease. We strongly recommend keeping aux-q-capacity large (>400k), to avoid triggering the adaptive beam and/or the overflow (-1 = set to 3*main-q-capacity). (int, default = -1)
  --beam                      : Decoding beam. Larger->slower, more accurate. If aux-q-capacity is too small, we may decrease the beam dynamically to avoid overflow (adaptive beam, see aux-q-capacity parameter) (float, default = 15)
  --cmvn-config               : Configuration file for online cmvn features (e.g. conf/online_cmvn.conf). Controls features on nnet3 input (not ivector features). If not set, the OnlineCmvn is disabled. (string, default = "")
  --computation.debug         : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false)
  --cuda-decoder-copy-threads : Advanced - Number of worker threads used in the decoder for the host to host copies. (int, default = 2)
  --cuda-worker-threads       : The total number of CPU threads launched to process CPU tasks. -1 = use std::hardware_concurrency(). (int, default = -1)
  --debug-computation         : If true, turn on debug for the actual computation (very verbose!) (bool, default = false)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : Determinize the lattice before output. (bool, default = true)
  --endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 5)
  --endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 2)
  --endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5)
  --endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = 8)
  --endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 1)
  --endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 2)
  --endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = true)
  --endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float, default = inf)
  --endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0)
  --endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20)
  --endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there must be nonsilence in the best-path traceback. (bool, default = false)
  --endpoint.silence-phones   : List of phones that are considered to be silence phones by the endpointing code. (string, default = "")
  --extra-left-context        : Number of frames of additional left-context to add on top of the neural net's inherent left context (may be useful in recurrent setups (int, default = 0)
  --extra-left-context-initial : If >= 0, overrides the --extra-left-context value at the start of an utterance. (int, default = -1)
  --extra-right-context       : Number of frames of additional right-context to add on top of the neural net's inherent right context (may be useful in recurrent setups (int, default = 0)
  --extra-right-context-final : If >= 0, overrides the --extra-right-context value at the end of an utterance. (int, default = -1)
  --fbank-config              : Configuration file for filterbank features (e.g. conf/fbank.conf) (string, default = "")
  --feature-type              : Base feature type [mfcc, plp, fbank] (string, default = "mfcc")
  --frame-subsampling-factor  : Required if the frame-rate of the output (e.g. in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1)
  --frames-per-chunk          : Number of frames in each chunk that is separately evaluated by the neural net.  Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames (int, default = 50)
  --global-cmvn-stats         : filename with global stats for OnlineCmvn for features on nnet3 input (not ivector features) (string, default = "")
  --gpu-feature-extract       : Use GPU feature extraction. (bool, default = true)
  --ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "")
  --ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float, default = -1)
  --ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas).  Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "")
  --ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; only relevant if the --silence-phones option is set. (float, default = 1)
  --lattice-beam              : The width of the lattice beam (float, default = 10)
  --main-q-capacity           : Advanced - Capacity of the main queue : Maximum number of tokens that can be stored *after* pruning for each frame. Lower -> less memory usage, Higher -> More accurate. Tokens stored in the main queue were already selected through a max-active pre-selection. It means that for each emitting/non-emitting iteration, we can add at most ~max-active tokens to the main queue. Typically only the emitting iteration creates a large number of tokens. Using main-q-capacity=k*max-active with k=4..10 should be safe. If main-q-capacity is too small, we will print a warning but prevent the overflow. The computation can safely continue, but the quality of the output may decrease (-1 = set to 4*max-active). (int, default = -1)
  --max-active                : At the end of each frame computation, we keep only its best max-active tokens. One token is the instantiation of a single arc. Typical values are within the 5k-10k range. (int, default = 10000)
  --max-batch-size            : The maximum execution batch size. Larger = better throughput, but slower latency. (int, default = 400)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --mfcc-config               : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "")
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --ntokens-pre-allocated     : Advanced - Number of tokens pre-allocated in host buffers. If this size is exceeded the buffer will reallocate, reducing performance. (int, default = 1000000)
  --num-channels              : The number of parallel audio channels. This is the maximum number of parallel audio channels supported by the pipeline. This should be larger than max_batch_size. (int, default = 600)
  --online-pitch-config       : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "")
  --optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true)
  --optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true)
  --optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)
  --optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool, default = true)
  --optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true)
  --optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true)
  --optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true)
  --optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647)
  --optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable.  If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request. (int, default = 2147483647)
  --optimization.memory-compression-level : This is only relevant to training, not decoding.  Set this to 0,1,2; higher levels are more aggressive at reducing memory by compressing quantities needed for backprop, potentially at the expense of speed and the accuracy of derivatives.  0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1)
  --optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model.  This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648)
  --optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true)
  --optimization.optimize     : Set this to false to turn off all optimizations (bool, default = true)
  --optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true)
  --optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true)
  --optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool, default = true)
  --optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true)
  --optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --plp-config                : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "")
  --reset-on-endpoint         : Reset a decoder channel when endpoint detected. Do not close stream (bool, default = false)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Command line was:
ERROR ([5.5.1013~1546-9b851]:ReadConfigFile():parse-options.cc:493) Invalid option --min-active=200 in config file model/conf/model.conf

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7f2634e07d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x2a) [0x7f2634907a9a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::ParseOptions::ReadConfigFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x3eb) [0x7f2634dfc80b]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(BatchRecognizer::BatchRecognizer()+0x468) [0x7f26349a51f8]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(vosk_batch_recognizer_new+0x20) [0x7f26349a40f0]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x6ff5) [0x7f2637d61ff5]
/usr/lib/x86_64-linux-gnu/libffi.so.7(+0x640a) [0x7f2637d6140a]
/usr/lib/python3/dist-packages/_cffi_backend.cpython-38-x86_64-linux-gnu.so(+0x1afd7) [0x7f2637d87fd7]
python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3() [0x59c176]
python3(_PyObject_MakeTpCall+0x1ff) [0x5f69af]
python3(_PyEval_EvalFrameDefault+0x5932) [0x570612]
python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
python3(_PyEval_EvalFrameDefault+0x71e) [0x56b3fe]
python3(_PyEval_EvalCodeWithName+0x26a) [0x5696da]
python3(PyEval_EvalCode+0x27) [0x68db17]
python3() [0x67eeb1]
python3() [0x67ef2f]
python3() [0x67efd1]
python3(PyRun_SimpleFileExFlags+0x197) [0x67f377]
python3(Py_RunMain+0x212) [0x6b7902]
python3(Py_BytesMain+0x2d) [0x6b7c8d]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2638a230b3]
python3(_start+0x2e) [0x5fb12e]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
INFO:root:Connection 0 from ('172.18.0.1', 40140)
ERROR ([5.5.1013~1546-9b851]:AddMatMat():cu-matrix.cc:1325) cublasStatus_t 8 : "CUBLAS_STATUS_ARCH_MISMATCH" returned from 'cublas_gemm(GetCublasHandle(), (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), m, n, k, alpha, B.data_, B.Stride(), A.data_, A.Stride(), beta, data_, Stride())'

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7f1a2b946d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(+0x696673) [0x7f1a2b7ff673]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0x47c) [0x7f1a2b81361c]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFinalFeaturesBatched(kaldi::LaneDesc const*, int, float, kaldi::CuMatrix<float>*, kaldi::CuMatrix<float>*)+0x37a) [0x7f1a2b521f3a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFeaturesBatched(kaldi::LaneDesc const*, int, kaldi::CuMatrixBase<float> const&, float, float, kaldi::CuMatrix<float>*)+0xa4) [0x7f1a2b522754]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::OnlineBatchedFeaturePipelineCuda::ComputeFeaturesBatched(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, float, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*, std::vector<int, std::allocator<int> >*)+0x29e) [0x7f1a2b51d66e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::ComputeGPUFeatureExtraction(std::vector<int, std::allocator<int> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0xe6) [0x7f1a2b4ed116]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::DecodeBatch(std::vector<unsigned long, std::allocator<unsigned long> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*> >*, std::vector<bool, std::allocator<bool> >*)+0xf7) [0x7f1a2b4f1277]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::CudaOnlinePipelineDynamicBatcher::BatcherThreadLoop()+0x25d) [0x7f1a2b4fe59d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6d84) [0x7f19f6b8bd84]
/usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f1a2f521609]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f1a2f65d293]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700
INFO:root:Connection 0 from ('172.18.0.1', 40400)
ERROR ([5.5.1013~1546-9b851]:AddMatMat():cu-matrix.cc:1325) cublasStatus_t 8 : "CUBLAS_STATUS_ARCH_MISMATCH" returned from 'cublas_gemm(GetCublasHandle(), (transB==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), (transA==kTrans? CUBLAS_OP_T:CUBLAS_OP_N), m, n, k, alpha, B.data_, B.Stride(), A.data_, A.Stride(), beta, data_, Stride())'

[ Stack-Trace: ]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::MessageLogger::LogMessage() const+0x7fe) [0x7f2354154d0e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(+0x696673) [0x7f235400d673]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CuMatrixBase<float>::AddMatMat(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, float)+0x47c) [0x7f235402161c]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFinalFeaturesBatched(kaldi::LaneDesc const*, int, float, kaldi::CuMatrix<float>*, kaldi::CuMatrix<float>*)+0x37a) [0x7f2353d2ff3a]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::CudaOnlineBatchedSpectralFeatures::ComputeFeaturesBatched(kaldi::LaneDesc const*, int, kaldi::CuMatrixBase<float> const&, float, float, kaldi::CuMatrix<float>*)+0xa4) [0x7f2353d30754]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::OnlineBatchedFeaturePipelineCuda::ComputeFeaturesBatched(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, float, kaldi::CuMatrixBase<float> const&, kaldi::CuMatrix<float>*, kaldi::CuVector<float>*, std::vector<int, std::allocator<int> >*)+0x29e) [0x7f2353d2b66e]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::ComputeGPUFeatureExtraction(std::vector<int, std::allocator<int> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&)+0xe6) [0x7f2353cfb116]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::BatchedThreadedNnet3CudaOnlinePipeline::DecodeBatch(std::vector<unsigned long, std::allocator<unsigned long> > const&, kaldi::Matrix<float> const&, std::vector<int, std::allocator<int> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*> >*, std::vector<bool, std::allocator<bool> >*)+0xf7) [0x7f2353cff277]
/usr/local/lib/python3.8/dist-packages/vosk/libvosk.so(kaldi::cuda_decoder::CudaOnlinePipelineDynamicBatcher::BatcherThreadLoop()+0x25d) [0x7f2353d0c59d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6d84) [0x7f231f399d84]
/usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f2357d2f609]
/usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f2357e6b293]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
WARNING ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): Tesla K80      free:11375M, used:66M, total:11441M, free/total:0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.1013~1546-9b851]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.994231
LOG ([5.5.1013~1546-9b851]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: Tesla K80       free:11239M, used:202M, total:11441M, free/total:0.982345 version 3.7
LOG ([5.5.1013~1546-9b851]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG ([5.5.1013~1546-9b851]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:56) Loading HCLG from model/graph/HCLG.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:60) Loading words from model/graph/words.txt
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:68) Loading winfo model/graph/phones/word_boundary.int
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:74) Loading subtract G.fst model from model/rescore/G.fst
LOG ([5.5.1013~1546-9b851]:BatchRecognizer():batch_recognizer.cc:76) Loading CARPA model from model/rescore/G.carpa
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.5.1013~1546-9b851]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
INFO:root:Listening on 0.0.0.0:2700

sskorol / vosk-api-gpu

OSError: Multiple exceptions #13