Open jwebmeister opened 10 months ago
nvm was thinking maybe different types of doors were named/coded a particular doortype.
You can specify “wedge the door” or “wedge the trapped door”, just as one example. It’s all in the grammar module. I haven’t noticed it causing any issues in my testing though.
You can specify “wedge the door” or “wedge the trapped door”, just as one example. It’s all in the grammar module. I haven’t noticed it causing any issues in my testing though.
I will look at the grammar module more deeply for the proper words/phrases.
I will test the blue/red cut off audio thing aswell.
@madmaximus101 don’t worry I figured it out. It was my audio settings. I had a gate setup that was just slightly too slow and/or too high.
The things I've gathered so far from reviewing your test data + videos @madmaximus101 :
@madmaximus101 can you please review and let me know what's missing?
The things I've gathered so far from reviewing your test data + videos @madmaximus101 :
"gold" and "hold" get misrecognized
- grammar module issue, new issue raised
model recognises some noise as commands, e.g. silence or random noise = "blue" or "freeze".
- Might be too small vocab in dataset, or excessive tuning, might be an issue with the fine-tuning process of Kaldi models in general (as SME advised), or the specific fine-tuning process for Kaldi Active Grammar.
colours get misrecognised as another colour, e.g. "red" = "blue", "blue" = "red.
- Might be pronunciation within the model, or it might be the same as the issue above, recognising silence or cut-audio (my stupid audio gate settings) as another colour. Needs more testing.
- "mirror the door" misrecognised as "wedge the door"
- "on me" misrecognised as "remove the wedge"
- "on me" misrecognised as "pie room"
- "gold on me" misrecognised as "gold halt"
@madmaximus101 can you please review and let me know what's missing?
I think if you're speaking a command of any kind, but looking at a door/entryway, or suspect/teammate. Regardless of what you say. It will execute whatever it thinks you said that is available in that command menu at the time. "on me" being recognised as pie room might be one of those. Unless fall in is available as a command in the command menu when looking at a door - will actually check this to make sure.
I've had consistent misrecognitions with "on me". Not as much with my refined mic settings though. "Fall in" pretty much works all the time. I can't remember it not failing, apart from random red/blue designation. Again - it doesn't happen as often now i've refined my mic settings.
Testing E-LM on the postal map. I had quite a few misrecognitions on one door at the offfice where you often come across the corrupt "fbi officer".
The door to that room was giving me all kinds of misrecognitions...When my commands from before seemed to work well beforehand. Odd.
There was a dead suspect right near the door? unsure if that's another potential quirk.
https://www.youtube.com/watch?v=xKwEUjsPFo8
Have another video showing same settings, same mic settings. more failures with recognition - because i was speaking/testing so much i couldn't speak properly by the point i recorded the video lol.
I have quite a few vids now showing a few quirks. Unsure if you've seen them all. https://www.youtube.com/@Madmaximus101/videos
Idea: for further context and understanding - might be good to link me a shared link with timestamp on a video you've watched for exact context if u see an issue. There might be some context i didn't explain properly.
Thought i'd point out something. The word "mirror" how does the model expect to hear it? Does the model expect to hear a more american sounding Mirreerrr or an aussie Mirraa? The American worded mirror if spoken quickly literally just sounds like Mirrrrrrrer with a buttload or R's lol.
Post test results + useful remarks here, ideally of both:
, using the same test data, and using the default Ready or Not grammar module.
Useful remarks include:
Important instructions:
retain.tsv
with the correct rules + text, see example workflow near the end of these instructions./scripts/copy_retain_item_cmds_only.ps1
that can be used in PowerShell to copy only "normal commands" out of./retain/
and into./cleanaudio_cmds/
_readyornot.py
grammar module, or very minor modifications, i.e. no new words../tacspeak.exe --test_model './cleanaudio_cmds/retain.tsv' './kaldi_model/' './kaldi_model/lexicon.txt' 4
./scripts/
folder related to cleaning up the retain.tsv and related .wav files.retain.tsv
and go through each line, reviewing the rule and text./retain/
folder in VLC media player on single file loop, pressing 'N' to move to next .wav as I read through each line of retain.tsvretain.tsv
to align with the audio.retain.tsv
, then when I'm done reviewing I run thelist_wav_missing_from_retain_tsv.ps1
first to make sure I'm deleting the right files, then rundelete_wav_missing_from_retain_tsv.ps1
script (option A is preferred, but hey we're all busy and life is too short to spend cleaning all the data).retain.tsv
, then when I'm done reviewing I run thelist_wav_missing_from_retain_tsv.ps1
first to make sure I'm deleting the right files, then rundelete_wav_missing_from_retain_tsv.ps1
script.Example report:
"listen_key_toggle":-1
, usingUSE_NOISE_SINK = True
; also picked up in base model but not as often._readyornot.py
without any modifications('./kaldi_model/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=1 D=0 I=0') ('./kaldi_model/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4}) ('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=0 D=1 I=0') ('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})