jwebmeister / tacspeak

Tacspeak - Fast, lightweight, modular speech recognition for gaming
GNU Affero General Public License v3.0
44 stars 2 forks source link

Testing methodology & tools for improving speech recognition accuracy #10

Closed jwebmeister closed 8 months ago

jwebmeister commented 9 months ago

Goal

A useful testing methodology, and supporting tools, that help developers (and/or users) to evaluate, compare, and most importantly improve the accuracy and performance of speech recognition of:

The keyword useful is intended to mean specific and sufficient enough to guide developers (and/or users) to changes they can make that will have an improvement on speech recognition accuracy and performance, on the grammar and types of commands they want to use.

Background

To date, the testing of models and grammar modules for speech recognition accuracy has almost exclusively been done:

The primary function of the type of testing describe in this issue is to:

Please check with @jwebmeister what the destination branch should be for pull requests. Dev-only (not useful to users) features should not be pulled into the main branch, at this current time.

Note: Development and training of models (to improve speech accuracy) is closely related, but will be primarily tracked by other issue(s). Additionally for info, the modeldev branch is primarily used for features specifically related to model development that won't be packaged into releases to users.

Note: "Recorded audio" mentioned below is (and should be) local only, i.e. @jwebmeister is recording and using his own audio. For now - a big no to any collection + transfer of user audio data; and for now and in the future - never without explicit permission from users.

Possible approaches

Possible ideas of approaches that could be considered (not an exhaustive list):

  1. Recorded audio test dataset - record and store audio files of spoken commands / phrases. Use to the test accuracy of models and/or grammar modules.
  2. Computer generated audio test dataset - generate audio files of spoken commands / phrases using text to speech. Potential for varying auditory (pitch, volume, speed), as well as varying words or phrases with equivalent words or phrases.
  3. Calculate phonetic similarity - measure, compare and rank, how phonetically similar commands are, within a grammar module. Identify commands with high similarity to guide developers and/or users to potentially chose different phrases (if recognition accuracy is an issue).

It is likely more than one approach should be pursued and developed.

Tracking

This issue a high-level tracker of all work related to testing methods and tools for speech recognition accuracy and performance.

Tools

Ready or Not - test case for testing methodology + tools

jwebmeister commented 8 months ago

I'm going to close this issue as: