solyarisoftware / voskJs

Vosk ASR offline engine API for NodeJs developers. With a simple HTTP ASR server.
Other
43 stars 9 forks source link

Buffer formatting ? #1

Closed wasweisic closed 3 years ago

wasweisic commented 3 years ago

Hey its me again im using voskJs and the german Model, but i think didn't get the Buffer format right. And sorry, I'm just at the beginning to understand js.

Console Output / ERROR

2021-04-20T16:42:10.171Z ::
2021-04-20T16:42:10.180Z :: log level          : 0
2021-04-20T16:42:10.181Z ::
LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=10 max-active=3000 lattice-beam=2
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.106011 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from models/vosk-model-small-de-0.15/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:251) Loading HCL and G from models/vosk-model-small-de-0.15/graph/HCLr.fst models/vosk-model-small-de-0.15/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo models/vosk-model-small-de-0.15/graph/phones/word_boundary.int
2021-04-20T16:42:12.747Z ::
2021-04-20T16:42:12.747Z :: init elapsed       : 2576ms
TypeError [ERR_INVALID_ARG_VALUE]: The argument 'path' must be a string or Uint8Array without null bytes. Received <Buffer@0x5e55a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ff...
    at Object.access (fs.js:205:10)
    at /home/runner/AI/voskjs.js:101:8
    at new Promise (<anonymous>)
    at transcript (/home/runner/AI/voskjs.js:96:10)
    at transcribe_vosk (/home/runner/AI/index.js:1046:26)
    at processTicksAndRejections (internal/process/task_queues.js:93:5) {
  code: 'ERR_INVALID_ARG_VALUE'

INPUT

function speak_impl(voice_Connection, mapKey) {
    voice_Connection.on('speaking', async (user, speaking) => {
        if (speaking.bitfield == 0 || user.bot) {
            return
        }
          console.log(`I'm listening to ${user.username}`)
        // this creates a 16-bit signed PCM, stereo 48KHz stream
        const audioStream = voice_Connection.receiver.createStream(user, { mode: 'pcm' })
        audioStream.on('error',  (e) => { 
            console.log('audioStream: ' + e)
        });
        let buffer = [];
        audioStream.on('data', (data) => {
            buffer.push(data)
        })
        audioStream.on('end', async () => {
            buffer = Buffer.concat(buffer)
            const duration = buffer.length / 48000 / 4;
            console.log("duration: " + duration)

            if (duration < 0.8 || duration > 19) { // 20 seconds max dur
                console.log("TOO SHORT / TOO LONG; SKPPING")
                return;
            }

            try {
                let new_buffer = await convert_audio(buffer)
                let out = await transcribe(new_buffer);
                if (out != null)
                //user.name
                    process_commands_query(out, mapKey, user.id);
            } catch (e) {
                console.log('tmpraw rename: ' + e)
            }
        })    
    })
}

VOSKjs function

const { initModel, transcript, freemodel } = require('./voskjs')

const germanModelDirectory = 'models/vosk-model-small-de-0.15'
const audioFile = 'audio/2830-3980-0043.wav'

async function transcribe_vosk(buffer){
// create a runtime model
const germanModel = await initModel(germanModelDirectory)

// speech recognition from an audio file
  try {
    const result = await transcript(buffer, germanModel) 

    console.log(result)
  }  
  catch(error) {
    console.error(error) 
  }  

// free the runtime model
freeModel(germanModel)
}
async function transcribe(buffer) {

  // return transcribe_witai(buffer)
  // return transcribe_gspeech(buffer)
  return transcribe_vosk(buffer)
}
solyarisoftware commented 3 years ago

Hi reading your code excerpt, I notice you pass to transcript() a buffer; that's wrong.

The first parameter of transcript() if a fileName: https://github.com/solyarisoftware/voskJs/blob/master/voskjs.js#L80

wasweisic commented 3 years ago

So if i understand you correctly I need to write my buffer stream to a "cache" .wav audio file. To transcribe it using voskJs. Context: I want to use voskjs as Speech to Text for a Discord/Chat bot. Therefore I don't know if that would cause more delay, especially for multiple Input in Realtime

solyarisoftware commented 3 years ago

well,

  1. Yes, to use transcript() as is, you have to generate a wav file in the correct format, from your buffer data, and pass the filename as first parameter.

  2. Or, you can modify transcript function to work with a buffer directly. Please note that buffer must be in the correct/required format (16K rate, mono, etc.) Tricky.

solyarisoftware commented 3 years ago

Qith current VoskJs version you can pass audio buffer (in PCM format) to the transcript function transcriptFromBuffer:

https://github.com/solyarisoftware/voskJs/blob/master/voskjs.js#L204

A star on the project is welcome :)