juesato / gspeech-api

Node.js wrapper library to easily access Google Speech Recognition API
20 stars 9 forks source link

GSpeech API

A node.js wrapper library around the Google Speech API for automatic speech recognition.

Easy-to-use high-quality speech recognition

Basic Usage

For clips under 60 seconds, usage is simple:

var gspeech = require('gspeech-api');
gspeech.recognize('path/to/my/file', function(err, data) {
    if (err) 
        console.error(err);
    console.log("Final transcript is:\n" + data.transcript);
});

Google servers ignore clips over 60 seconds, so for clips longer than that, you have to specify how you want your audio files split into pieces. To use default package settings, the same code from above for clips under 60 seconds works.

The speed varies, but in general, one hour of audio will take a couple minutes to process.

Installation

Gpseech-api relies on fluent-ffmpeg to deal with different audio formats, which has a dependency on ffmpeg.

Unfortunately, ffmpeg is a little bit tricky to install on Ubuntu 12.04 and 14.04. I followed the instructions to install from source from the ffmpeg Installation page.

All other dependencies should be automatically handled by npm.

npm install gspeech-api

Documentation

This package exposes one main method, gspeech.recognize(options, callback) for taking a file and returning an array of timed captions along with a final transcript.

Arguments

Options

If options is passed as a String, it is taken as the path for file. Otherwise, it should be an Object, which can have the following attributes:

Callback

Callback is a function which will be called after all requests to Google speech servers have completed. It is passed two parameters callback(err, data):

More Examples

Getting a timed transcript

gspeech.recognize('path/to/my/file', function(err, data) {
    if (err) 
        console.error(err);
    for (var i = 0; i < data.timedTranscript.length; i++) {
        // Print the transcript
        console.log(data.timedTranscript[i].start + ': ' 
                  + data.timedTranscript[i].text + '\n');
    }
});

Specifying times to split audio

If you would like to generate a timed transcript, and know where fragments start, specify these times to the library.

var segTimes = [0, 15, 20, 30];
gspeech.recognize({
        'file': 'path/to/my/file',
        'segments': segTimes,
    }, 
    function(err, data) {
        if (err) 
            console.error(err);
        for (var i = 0; i < data.timedTranscript.length; i++) {
            console.log(data.timedTranscript[i].start + ': ' 
            + data.timedTranscript[i].text + '\n');
        }
    }
);

Disclaimer

This is not an officially supported Google API, and should only be used for personal purposes. The API is subject to change, and should not be relied upon by any crucial services. I'm also not actively maintaining this repo - for modifications, the best option is probably to make your own fork.