jwebmeister / tacspeak

Tacspeak - Fast, lightweight, modular speech recognition for gaming
GNU Affero General Public License v3.0
44 stars 2 forks source link

Model test results - model 20240117 #23

Open jwebmeister opened 10 months ago

jwebmeister commented 10 months ago

Post test results + useful remarks here, ideally of both:

, using the same test data, and using the default Ready or Not grammar module.

Useful remarks include:

Important instructions:

Example report:

('./kaldi_model/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=1 D=0 I=0') ('./kaldi_model/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4}) ('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=0 D=1 I=0') ('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})

jwebmeister commented 9 months ago

nvm was thinking maybe different types of doors were named/coded a particular doortype.

You can specify “wedge the door” or “wedge the trapped door”, just as one example. It’s all in the grammar module. I haven’t noticed it causing any issues in my testing though.

madmaximus101 commented 9 months ago

You can specify “wedge the door” or “wedge the trapped door”, just as one example. It’s all in the grammar module. I haven’t noticed it causing any issues in my testing though.

I will look at the grammar module more deeply for the proper words/phrases.

jwebmeister commented 9 months ago

I will test the blue/red cut off audio thing aswell.

@madmaximus101 don’t worry I figured it out. It was my audio settings. I had a gate setup that was just slightly too slow and/or too high.

jwebmeister commented 9 months ago

The things I've gathered so far from reviewing your test data + videos @madmaximus101 :

@madmaximus101 can you please review and let me know what's missing?

madmaximus101 commented 9 months ago

The things I've gathered so far from reviewing your test data + videos @madmaximus101 :

  • "gold" and "hold" get misrecognized

    • grammar module issue, new issue raised
  • model recognises some noise as commands, e.g. silence or random noise = "blue" or "freeze". 

    • Might be too small vocab in dataset, or excessive tuning, might be an issue with the fine-tuning process of Kaldi models in general (as SME advised), or the specific fine-tuning process for Kaldi Active Grammar.
  • colours get misrecognised as another colour, e.g. "red" = "blue", "blue" = "red.

    • Might be pronunciation within the model, or it might be the same as the issue above, recognising silence or cut-audio (my stupid audio gate settings) as another colour. Needs more testing.
  • "mirror the door" misrecognised as "wedge the door"
  • "on me" misrecognised as "remove the wedge"
  • "on me" misrecognised as "pie room"
  • "gold on me" misrecognised as "gold halt"

@madmaximus101 can you please review and let me know what's missing?

I think if you're speaking a command of any kind, but looking at a door/entryway, or suspect/teammate. Regardless of what you say. It will execute whatever it thinks you said that is available in that command menu at the time. "on me" being recognised as pie room might be one of those. Unless fall in is available as a command in the command menu when looking at a door - will actually check this to make sure.

I've had consistent misrecognitions with "on me". Not as much with my refined mic settings though. "Fall in" pretty much works all the time. I can't remember it not failing, apart from random red/blue designation. Again - it doesn't happen as often now i've refined my mic settings.

Testing E-LM on the postal map. I had quite a few misrecognitions on one door at the offfice where you often come across the corrupt "fbi officer". The door to that room was giving me all kinds of misrecognitions...When my commands from before seemed to work well beforehand. Odd. There was a dead suspect right near the door? unsure if that's another potential quirk.
https://www.youtube.com/watch?v=xKwEUjsPFo8

Have another video showing same settings, same mic settings. more failures with recognition - because i was speaking/testing so much i couldn't speak properly by the point i recorded the video lol.

I have quite a few vids now showing a few quirks. Unsure if you've seen them all. https://www.youtube.com/@Madmaximus101/videos

Idea: for further context and understanding - might be good to link me a shared link with timestamp on a video you've watched for exact context if u see an issue. There might be some context i didn't explain properly.

Thought i'd point out something. The word "mirror" how does the model expect to hear it? Does the model expect to hear a more american sounding Mirreerrr or an aussie Mirraa? The American worded mirror if spoken quickly literally just sounds like Mirrrrrrrer with a buttload or R's lol.