Open jwebmeister opened 8 months ago
Edited with instructions / guidance on an example workflow on how to clean up the data after a play session, and point to the helpful ./scripts/
folder, to make data cleaning a slightly easier slj.
Currently testing the Experimental version. When speaking commands for fall in & arrest em/them/him/her blue team will be told to arrest often, sometimes red, but mostly blue. If i speak extremely clearly and with proper wording this doesnt seem to happen. But i do have to be "like a tv presenter" announciating words very properly. The Restrain command has less of this wierdness but it is still present. Again like before Blue team will be often unprompted to be given the command, with red sometimes being told to do it. The restrain command is less prone to this though which is interesting. I have noticed saying c4 in this model doesn't really work anymore. If i say c2 it pretty much works as expected. In general i noticed a habit of blue being told to do commands when i just said the command without red or blue at start of speaking. I am attempting to compile a test run with Powershell but i am getting an error: argument --testmodel: expected 4 arguments. edit: nvm...forgot the "4" lol
Awesome! Thanks for the feedback @madmaximus101 . A few questions and comments below.
Currently testing the Experimental version. When speaking commands for fall in & arrest em/them/him/her blue team will be told to arrest often, sometimes red, but mostly blue. If i speak extremely clearly and with proper wording this doesnt seem to happen. But i do have to be "like a tv presenter" announciating words very properly. The Restrain command has less of this wierdness but it is still present. Again like before Blue team will be often unprompted to be given the command, with red sometimes being told to do it. The restrain command is less prone to this though which is interesting.
Do you have listen_key set to a loud or clicky key? I have a theory that it’s picking up some noise as “Blue” before you start speaking. It might also be retaining some audio before you press the listen_key, which would be a problem I’d need to fix in code if it’s the case (though I thought I already fixed it!). It could also be the model but I want to narrow down possibilities.
Can you try testing with listen_key_toggle
=2
, and try to speak the same commands that were having issues, speaking in the same manner, see if the problem persists?
You shouldn’t need to speak as a TV presenter for it to work accurately, if you do there’s something wrong.
I have noticed saying c4 in this model doesn't really work anymore. If i say c2 it pretty much works as expected.
Is C4 active in your grammar module? It isn’t by default. If it is a valid command in your grammar module, but it’s not being recognised, please confirm / let me know.
My listen on/off key is set to my mouse thumb button it is not really noisy. i have noticed however if i breath or sigh, or if I'm typing away it will recognise noises and attempt to decode them. if i don't want any listen padding at start and end or any automatic voice on/off feature which setting do i change?
edit: will try listen key toggle 2.
edit: i am using the experimental model as provided.
if i don't want any listen padding at start and end or any automatic voice on/off feature which setting do i change?
@madmaximus101 listen_key_padding_end_ms_max
and_min
are options you can change to set the amount of audio captured after releasing the listen_key.
There shouldn’t be any audio prepended before you press the listen_key (for listen_key_toggle 0 and -1), but if you’re sure it is prepending audio, let me know.
If listen_key_toggle is set to -1, it will always be listening for either YellFreeze or NoiseSink, so it’s fine for it to decode noises, as long as it doesn’t Yell without you saying “freeze” (or similar)… unfortunately it will likely yell at noises at least sometimes unless you have a very quiet environment and good mic. Just let me know if it’s truly unplayable and if it’s worse than the base model.
Listen key toggle 2 seems to be better, hot mic always on seems to be a much better experience overall, no random ghost or added-on commands, much higher success rate in general, i did have some misheard commands, either due to not being clear enough or i assume to quick speaking. With some of the retained audio i've noticed i seem to have a tendancy to breath in or make an innitial "opening mouth sound" as i click the hot mic button or just after i did. May have to learn to not do that lol. I also noticed my mic volume was way up. That might be a contributing factor also - potential for minor distortion of sound to ruin things etc. Will lower mic volume lol.
Edit: for reference my headset is the sennhieser gsp 670. I'd say it's better than average quality for sure.
Here is a link to a gameplay session using the Listen key toggle 2 hot mic always on. Experimental Kaldi model as provided. Edit: as of right now HD quality is still being uploaded. Text on screen will be hella blurry until it finishes - 30-45mins. https://www.youtube.com/watch?v=Mxzgd5aaR4Y
I am attempting to run the test thing in Powershell. I think i'm running the command correctly & nothing is happening? the command runs, but i get no results, output or files generated?
I am attempting to run the test thing in Powershell. I think i'm running the command correctly & nothing is happening? the command runs, but i get no results, output or files generated?
@madmaximus101 check in the retain.tsv, are the referenced file paths to the .wav files correct? e.g. ./cleanaudio_cmds/retain-123.wav
i downloaded the tacspeak app & kaldi model as is. Haven't changed anything. If I forgot an instruction in regards to these needing filepaths modified i apologise.
Pic of my retain.tsv
I noticed in your powershell window. The '. after test_model were grey. In my powershell window the '. after test_model is blue. Thought i'd point that out just incase that means anything.
pic of my user settings.
@madmaximus101 You’re running the command with ./cleanaudio_cmds/retain.tsv whereas it should probably be ./retain/retain.tsv
It doesn’t matter what the path is as long as:
Ahh, a simple filepath error. I copy pasted the example command given without a second thought 😅.
Appreciate the patience & help mate.
On Tue, 23 Jan 2024, 10:05 pm Joshua Webb, @.***> wrote:
@madmaximus101 https://github.com/madmaximus101 You’re running the command with ./cleanaudio_cmds/retain.tsv whereas it should probably be ./retain/retain.tsv
It doesn’t matter what the path is as long as:
- ./somedir/retain.tsv is a valid file and path
- the .wav file paths in the retain.tsv are valid files and paths.
— Reply to this email directly, view it on GitHub https://github.com/jwebmeister/tacspeak/issues/23#issuecomment-1905969329, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEXH6Y7H7AZW2D2WDPLFT6LYP6VBZAVCNFSM6AAAAABCDGEX5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVHE3DSMZSHE . You are receiving this because you were mentioned.Message ID: @.***>
ok, i am at a point where i feel i am ready to start doing the initial collecting of data in somewhat of an organised manner, will start collating data and using the scripts etc. Is there a map in particular you would like me to play to reference for your own comparisons to make things easier? As less variables as possible etc. Any particular commands or ways of speaking you would like me to try?
@jwebmeister
Some initial notes & basic observations so far with some newly found quirks, yay! 😛
Lowering mic gain (I had my mic gain set to silly levels (in-game) for some reason, i don't remember doing it or why i would lol? it is now at 100%) seems to have helped a lot with accuracy of words & randomly detected noise attempting to be decoded. I no longer get the tapping of my keyboard or loud sighs, or mouth noises being picked up & tacspeak trying to decode it. I apologise for missing the stupid high mic gain levels with my earlier testing & wasting your time with that - my bad.
I am the only person in my house with 2 cats. I was using Listen key toggle -1 when it was their dinner time, they were meowing right underneath my chair & the Noise sink feature didn't detect it at all & there were no noises attempted to be decoded or false commands given.
Something of note a weird quirk i'm assuming with how Ready or Not is designed with any Kaldi model I've used. Some of the speech commands were completely different from what i said. I should of realised this instantly as I've pointed this exact issue out before lol. Took me a bit to figure out what the F**K was happening. I was seriously HUH!?? I figured out these random commands being executed different to what was spoken were happening when I was accidentally looking through multiple doorways. The commands given I guess was tacspeaks interpretation of what command I was attempting to give with what was available to be given. This quirk also happened when interacting with an ajar door. I first noticed this particular quirk when I was attempting to give a command to mirror the door when i didn't realise it was ajar. I then tested further & found any door command requiring physical interaction or placement of a device (wedge, c2, mirror) when a door was ajar would result in blue or red being given a command to stack up, or another basic command such as fall in or cover me sometimes breach & clear would be executed.
Is there a way to make a command so when a command for wedge, mirror, c2 is given on an ajar door. The operator closes the door & if possible continues the intended command(s) given?
I have also noticed another door quirk again i think due to Ready or Not's design. If I command blue team or red team to stack up on a door & then command the other team to stack up on the same door. It defaults the command to the team already stacked up on the door, so nothing happens. It also defaults any given door related command whilst looking at the door to the team that is currently stacked up to said door. Example: Blue team stack up, I then would speak Red team breach and clear. Blue team would breach and clear.
Edit: in hindsight i realise i should of tested this with other commands such as fall in, on me, cover me, commanding one team to do these commands whilst looking at the door the second team was stacked up on. Will update comment with this result if you would like that.
Testing the listen key toggle 2 setting with the experimental kaldi model I've noticed if I stop briefly then continue it will give an unintentional command mid-sentence (blue team "slight pause" breach and clear) I am attributing this to my own cautious bias & my own learnt speech habits interacting with Tacspeak. When I speak in one fluid continuous sentence it does seem to work, although listen key toggle 2 setting does sometimes pick up my random mouth noises when i sigh louder than normal, or if I make a "tutting noise" haha. This issue is very, very much reduced with mic gain now at 100%, basically almost a non-issue at this point. If i mucked up speaking a command (brain fart) with Listen key toggle 2 & a command was given i didn't intend I would speak "fall in". Depending on my level of panic or quick speaking this sometimes worked & sometimes didn't. Listen key toggle 2 I found requires you keep your speech in check 😅
The Listen key toggle -1 setting with the experimental Kaldi model also seems to have less unintentional noises detected with my mic gain now at 100%. If I mucked up a command (brain fart) i would let go of mouse thumb button which would lessen the impact of the error. I would then press thumb button again & issue the fall in command to stop the command currently happening which would come out correctly. I know there is a halt, cancel, stop command but my brain just thinks of fall in in the moment lol.
@jwebmeister Noisesink seems to work as intended with high accuracy in the few times it's activated since i've corrected my mic gain to 100%. If I do make a noise detected by Noisesink such as a burp, cough, or i hit the desk it activates.
I will test Noisesink with words & phrases a person might say in surprise, fright, dissapointment or anger.
@jwebmeister newly found quirks aside i am of the opinion the Listen key toggle -1 setting is pretty good and pretty much working as intended...now that my mic gain is at appropriate levels - again i apologise.
I will test this setting further whilst watching out for where i'm looking when giving said commands.
The "F word & F you" are often picked up as
on_recognition (INFO): KaldiRule(16, ReadyOrNot_priority::YellFreeze) | drop Freeze!
i have some results of the testing here.
There was only one mistake out of the short run of commands i did here in-game as a little test run to make sure things were running as they should with things.
I Wasn't sure how to change this in the text files to reflect the result so i will explain.
The command recorded red team secure area I actually said red team kick and clear. I said this a second time more clearly and it gave the correct command.
Thanks @madmaximus101
I Wasn't sure how to change this in the text files to reflect the result so i will explain. The command recorded red team secure area I actually said red team kick and clear. I said this a second time more clearly and it gave the correct command.
retain.tsv
retain.tsv
, overwriting the existing fileThe "F word & F you" are often picked up as on_recognition (INFO): KaldiRule(16, ReadyOrNot_priority::YellFreeze) | drop
I don't think there's an easy fix, other than re-training the model, and my previous attempts to do just that didn't result in any improvements. However, current options or work-arounds are:
listen_key_toggle
to 0 or 2 in ./tacspeak/user_settings.py
(1 also, but I personally don't recommend it)Is there a way to make a command so when a command for wedge, mirror, c2 is given on an ajar door. The operator closes the door & if possible continues the intended command(s) given?
Not without first specifying via speech that the door is ajar, e.g. "wedge the ajar door" instead of "wedge the door". Similar to the multiple "door", "doorway", "hallway" issue, it's a problem more effectively solved from the game devs (Void) side of things, as implementing a workaround from tacspeak will reduce speech recognition accuracy. I'll consider revisiting this if there's no updates from Void that address some of the command menu quirks.
I have also noticed another door quirk again i think due to Ready or Not's design. If I command blue team or red team to stack up on a door & then command the other team to stack up on the same door. It defaults the command to the team already stacked up on the door, so nothing happens. It also defaults any given door related command whilst looking at the door to the team that is currently stacked up to said door. Example: Blue team stack up, I then would speak Red team breach and clear. Blue team would breach and clear. Edit: in hindsight i realise i should of tested this with other commands such as fall in, on me, cover me, commanding one team to do these commands whilst looking at the door the second team was stacked up on. Will update comment with this result if you would like that.
I thought I was going crazy, thank you, this explains quite a lot. I only ran into this issue when playing Ides of March (so far). If you've tested it, or willing to test it, can you confirm what the extend is of it changing the team selection, what commands it affects outside of just breach and clear?
Testing the listen key toggle 2 setting with the experimental kaldi model I've noticed if I stop briefly then continue it will give an unintentional command mid-sentence (blue team "slight pause" breach and clear) I am attributing this to my own cautious bias & my own learnt speech habits interacting with Tacspeak.
The same thing happens with my speech. If you change listen_key_toggle
to 2, I suggest also changing vad_padding_end_ms
to 250. Otherwise experiment with values for vad_padding_end_ms
, this setting helps determine when enough silence has been detected to end the utterance, attempt recognition, and execute commands.
Here is a link to a gameplay session using the Listen key toggle 2 hot mic always on. Experimental Kaldi model as provided. Edit: as of right now HD quality is still being uploaded. Text on screen will be hella blurry until it finishes - 30-45mins. https://www.youtube.com/watch?v=Mxzgd5aaR4Y
@madmaximus101 cheers for the video, it's extremely helpful.
Based on the video, Tacspeak and/or the experimental model isn't performing "good enough" imho (though I also need more test data). As you said, you're having to speak as a newscaster for it to be reliably accurate, and there were some commands spoken that were misrecognised for no good reason that I could determine, e.g. "on me" was recognised as "team remove wedge"!?
It failing to recognise C4 as C2 (or written out as "c two") is reasonable to me as it's not a valid command, unless the grammar module has been explicitly changed to recognise "c four" as an option... then I definitely want to know about it.
In hindsight I realise there's probably too much manual effort required from testers to get good test data (as opposed to being an automatic process). For example, the test data will only show misrecognitions if the user manually cleans and updates the data, and the test data won't show failed recognitions (i.e. not recognised commands) unless the user mentally notes it or records the full play session. I haven't got any good ideas on how to fix this however.
Have you tested the base model? Does it do better / worse than the experimental model?
Long weekend coming up. I have fixed up my mic gain issue & will do more testing of both models to have a proper comparison. My posts were a tad jumbled & not really consistent haha.
Based on the video, Tacspeak and/or the experimental model isn't performing "good enough" imho (though I also need more test data). As you said, you're having to speak as a newscaster for it to be reliably accurate, and there were some commands spoken that were misrecognised for no good reason that I could determine, e.g. "on me" was recognised as "team remove wedge"!?
I might have given some unfair results with my not knowing of stupid mic gain levels & not really being aware of speech issues & quirks with my earlier posts/results. I will re-do my testing in a more thorough manner now that quirks & specific issues have been identified.
I thought I was going crazy, thank you, this explains quite a lot. I only ran into this issue when playing Ides of March (so far). If you've tested it, or willing to test it, can you confirm what the extend is of it changing the team selection, what commands it affects outside of just breach and clear?
My current idea to be most helpful towards you atm with the things needing further clarity or discovery is recording video deliberately testing these issues/quirks to see what is possible/not possible/quirk/error etc - Giving a link to the video along with description of how things went aswell as the results with testing the retain.tsv.
What sort of things are you looking for or want cleaned in regards to audio? I have a pretty quiet house as it's just me so there is not often any random noises generated apart from maybe my own speech quirks and mouth sounds.
I am also thinking maybe i can put together an edited video of sorts displaying things comparing commands with different model. "same scenario, same commands, same doors - different model". Switching between models but using the same commands as video progresses. I can acquire some editing software easily.
Have you tested the base model? Does it do better / worse than the experimental model?
In general I do have a sense that the medium model has less errors & i feel i am able to talk normal without feeling the need to be cautious with my speech. The large model is even more so like that. I havn't used the bare bones base model suggested in the main tacspeak page in a while.
It failing to recognise C4 as C2 (or written out as "c two") is reasonable to me as it's not a valid command, unless the grammar module has been explicitly changed to recognise "c four" as an option... then I definitely want to know about it.
When I breach & clear with the command for "c4" with the medium model or the large model It works pretty reliably, unsure if this is because of pure luck & it consistently recognising "c4" as "c2" or whether the language model has some sort of deliberate word detection for that specific thing. I can't remember it not working. Why I seem to have a habit of saying c4 instead of c2 lol.
In hindsight I realise there's probably too much manual effort required from testers to get good test data (as opposed to being an automatic process). For example, the test data will only show misrecognitions if the user manually cleans and updates the data, and the test data won't show failed recognitions (i.e. not recognised commands) unless the user mentally notes it or records the full play session. I haven't got any good ideas on how to fix this however.
I would be willing to learn things, have always wanted to learn python, never had a reason to - this peaks my interest very much. I would also be willing to do some Speech training, is this something i can help with? I've also noticed there is a training folder in the experimental Kaldi. Does that have something to do with the data collection & modifying/cleaning up data?
- Open
retain.tsv
- Change "GroundOptions" to "BreachAndClear", for the highlighted error
- Change "red team secure area" to "red team kick and clear", for the highlighted error
- Save
retain.tsv
, overwriting the existing file
Thanks for the info, this will def help processing data on the next set of retain audio info i gather, thankyou!
Not without first specifying via speech that the door is ajar, e.g. "wedge the ajar door" instead of "wedge the door". Similar to the multiple "door", "doorway", "hallway" issue, it's a problem more effectively solved from the game devs (Void) side of things, as implementing a workaround from tacspeak will reduce speech recognition accuracy. I'll consider revisiting this if there's no updates from Void that address some of the command menu quirks.
Is there a way for a spoken command to be deliberately denied or stopped if what was spoken is very wrong from the expected command say when someone might be accidentally looking through multiple open doorways? Possibly a system created where tacspeak automatically implements a stop command in a situation where there is a massive difference between spoken & executed commands.
I have figured out how to change what model Tacspeak is actively using in the usersettings file. I was manually changing out the folders to do this lol. Does changing the usersettings file in the manner affect results or skew things?
My current method atm is to have entirely seperate folders of each iteration/test/result/attempt using tacspeak to completely seperate & have a visual indication of literal seperation of datasets.
What sort of things are you looking for or want cleaned in regards to audio?
@madmaximus101 A direct comparison between the base model (I mean the medium lm model when I say base) and the experimental model. What works well in one but not the other is what I'm most concerned with. In regards to actual commands or gameplay, no idea, just everything, as much regular play as possible.
In general I do have a sense that the medium model has less errors & i feel i am able to talk normal without feeling the need to be cautious with my speech. The large model is even more so like that.
I need to quantify it, and I need to test it using other people's speech other than my own. Please if you can, run the tests on the same retained data using:
When I breach & clear with the command for "c4" with the medium model or the large model It works pretty reliably, unsure if this is because of pure luck & it consistently recognising "c4" as "c2" or whether the language model has some sort of deliberate word detection for that specific thing.
The finetuning in the experimental model seems to have grossly skewed the word probabilities. This means that there's a larger difference between "c two" and other words including "c four" in the experimental model than the base model. This should both make it more accurate and precise, but also less lenient.
I would also be willing to do some Speech training, is this something i can help with?
Not yet, otherwise we'd both be wasting our time. It's unfortunately not as easy as just tweaking the training values to try to balance it, so I need hard test data to focus in on where specifically the models falling down, as an indicator of where part of the training process is falling down (this is my focus, much more so than just fixing the model).
At the end of this experiment, a very possible conclusion is that there's no practical benefit to finetuning the model (in fact I have SME advice saying exactly that), and that you'd need to train the model from scratch to see any real benefit. If this is the conclusion, hard test data would be of even greater benefit, as a model from scratch should be even more sensitive to the training process and data put into it.
I need to quantify it, and I need to test it using other people's speech other than my own. Please if you can, run the tests on the same retained data using:
I need to quantify it, and I need to test it using other people's speech other than my own. Please if you can, run the tests on the same retained data using:
- the experimental model, and
- the base (medium lm) model, and
- (optional, for extra credit) the large lm model.
Ok got it. It just clicked (lightbulb moment) the retained audio files don't change. The A.I does. Makes sense.
Is there a way for a spoken command to be deliberately denied or stopped if what was spoken is very wrong from the expected command say when someone might be accidentally looking through multiple open doorways? Possibly a system created where tacspeak automatically implements a stop command in a situation where there is a massive difference between spoken & executed commands.
Issue #14 , requires support / integration from Void.
Alternatively, for a flub while speaking, there could be a key phrase to just change the command action to noop (do nothing), e.g. "\<dictation> (s- | f-) I messed up". I deliberately haven't tried it or put it in because I believe it's very likely to negatively affect speech recognition accuracy, e.g. a valid command + some noise at the end = noop instead of a valid command. Having said that, it might be worth experimenting, I just have had other priorities.
I have figured out how to change what model Tacspeak is actively using in the usersettings file. I was manually changing out the folders to do this lol. Does changing the usersettings file in the manner affect results or skew things? My current method atm is to have entirely seperate folders of each iteration/test/result/attempt using tacspeak to completely seperate & have a visual indication of literal seperation of datasets.
That's more effort than I put in! I've just been renaming the model folders, for no good reason, but the user_settings should work if you're running tacspeak.exe without additional arguments.
You don't need to change user_settings or folder names if you're running --test_model as you're already specifying which model directory to use in the arguments,
e.g. ./tacspeak.exe --test_model './retain/retain.tsv' './kaldi_model/' './kaldi_model/lexicon.txt' 4
Ok got it. It just clicked (lightbulb moment) the retained audio files don't change. The A.I does. Makes sense.
Yep. Ideally playtest with each model for at least a few mission. Then run --test_model using each model on all of the data retained from the playtests (including the playtests where the same model wasn't used). Hopefully that makes sense.
That's more effort than I put in! I've just been renaming the model folders, for no good reason, but the user_settings should work if you're running tacspeak.exe without additional arguments.
it's more i got annoyed with having to copy/paste/delete/change folder names to use tacspeak with the model i was wanting to use. This way i don't have to chop & change folder names or move folders around to use tacspeak in-game with a different model lol.
Yep. Ideally playtest with each model for at least a few mission. Then run --test_model using each model on all of the data retained from the playtests (including the playtests where the same model wasn't used). Hopefully that makes sense.
All voice data collected during gameplay. Delete audiofiles containing mistakes or mispoken words/obvious errors - aswell as delete the corrosponding entry in the associated files along with it. The earlier post where you instructed further on the retained.tsv thing will help with this. WIll comment for further assistance if i get stuck on this again.
Am currently playing idles of march - each video i will be recording for basic at face value/assessment will be using a diff model.
The restrain command sometimes does not work correctly, i've had this issue in various levels of error regaurdless of model. When it doesn't work correct it's often followed by a move here command, or fall in. I believe this to be caused by the actual restrain command only being able to be issued by having to mouse over a very particular spot on the npc in question aswell as what i think to be a distance activated thing aswell.
If you tell a team member to mirror the door, wedge the door, c2, gas etc. Sometimes this command will designate red or blue to innitiate the command instead of gold. At first i was like huh....this command would often be repeatable with the same result...then it hit me...the team in question that gets designated to fullfill the command are the only ones with said device...so of course it will either default the command to team with device or will just be designated as gold team ie: Not a problem, will need to investigate this further to confirm. Same can be said for removing of devices. If gold team is current selected team & i issue a command to remove a device from a door & red or blue ends up being designated for the command instead. I figured out this is because red or blue most likely has the maximum amount of said devices in the tactical pouch/slot SO of course "gold team" - current team command i just issued will sometimes get issued as red or blue - ie: not a problem! - I will test this further to confirm.
If looking at a door & issuing on me, fall in. The command breach & clear will be executed.
Have made 3 videos depicting E-LM M-LM & B-LM. I almost went for editing the vids into one homogeneous vid, but my brain didn't like the idea after all lol. Will be uploading shortly with descriptions & general info of each vid's happenings & quirks.
All on Idles of March map.
The erroneous red/blue designation of tasks seems to be limited to the E-LM model. Overall, my findings are that the M-LM & B-LM are much more stable speech recognition wise. Across the board there are missteps & wrong commands given even with the M-LM & B-LM this can be imo attributed to not looking at the exact spot intended in the exact moment the command was given i.e.: Not looking exactly at spot to arrest suspect, not looking exactly at door, accidentally looking through multiple doorways.
The E-LM model does seem to have a few errors - there is no denying that. What was/is the goal for the E-LM model? To have a custom bespoke speech recognition exactly/specifically designed for Ready or Not & Tacspeak? Smaller file size overall?
If there is some sort of specific design choice/pathway/idea for the E-LM I would be willing to brainstorm or help with further refining the idea. I'm definitely nowhere near your level of knowledge with coding though so I wouldn't be able to help with that aspect.
I do have a good problem-solving brain lol. fixing up cars, electrics, I.T, networking (Unraid mostly), all self-taught etc - giving u context is all 😀
From what I understand of your previous comment with no point bothering with further speech training on the E-LM if it turns out it's a bust. There's no point speech training the E-LM if the backbone of the A.I Speech recognition of the E-LM is too strict or not as..."flexible?" in the first place?
I am now currently testing to take into account & correct & or fiddle with settings to adjust for my own "bad-habits" in relation to Tacspeak - currently testing experimental model.
Edit: I have a theory to an issue I'm currently testing. I think one of my bad habits is i have a habit of speaking either just as I press the speech button or just before it - potentially causing issues with the recognition which i think could be the cause of the erroneous blue/red command designation. This doesn't seem to be as big of an issue or a noticeable thing with the MLM or BLM.
Edit: I have changed this setting highlighted to see if this has a positive effect - shortening the amount of time before speech can be detected. Is this the correct setting for what i think it is?
Edit: I have noticed the E-LM has trouble with "Mirror the door" in general aswell as "on me" sometimes "fall in" aswell
Potential solution to users not being proficient in correctly sorting/refining/cleaning & getting good data.
Upload entire tacspeak folder to google drive with all data intact?
Potential solution to users not being proficient in correctly sorting/refining/cleaning & getting good data. Upload entire tacspeak folder to google drive with all data intact?
No. I don’t need or want anyone to upload their speech data anywhere. I only need the overall test results and any specific findings on what words the experimental model gets wrong that the base model gets right.
I have changed this setting highlighted to see if this has a positive effect - shortening the amount of time before speech is recognised. Is this the correct setting?
It shouldn’t have an effect, or at least not a positive one, that setting is related to the voice activity detector. There isn’t really a direct setting to intentionally capture audio before you press the listen_key.
What was/is the goal for the E-LM model?
It’s a test of the model finetuning / training process, to figure out what part of the process needs to be adjusted and/or if it’s (or which areas are) worth further investment of time and effort.
There are a number of things I can try to address some of the issues already identified, but I need hard data to narrow it down to specifics, so that I’m not wasting my time. All of the potential fixes will take a great deal of time and effort, beyond what I’ve already put in.
There aren’t any design decisions to be made until the finetuning and training process + code is 100% “working”. The most helpful that can be provided right now is test data. After that I can prioritise tasks and put together a plan of attack, doing so before gathering and reviewing test data is a waste of time.
I am attempting to run the scripts to tidy up things - keep getting this error?
I am attempting to run the scripts to tidy up things - keep getting this error?
The easiest workaround is to run powershell as administrator. Otherwise check out this article
Make sure to run the relevant “list” script first before running any “delete” scripts to make sure only the correct items will be deleted. There’s no undo with powershell.
Edit: also run the scripts from the same directory as tacspeak.exe and where the “retain” folder lives, e.g. ./scripts/some_script.ps1.
https://www.youtube.com/watch?v=3qDAMdt_v_k This is where I originally identified a consistent issue with "mirror the door", "fall in", "on me" Sometimes telling blue or red to do it, or sometimes executing a wedge command with red or blue. I've also noticed some quirks with trap commands & wedge commands. Specially if there is a trap or a wedge already on the door sometimes with this model.
https://www.youtube.com/watch?v=1fxtZCWRs3w&t=635s This was a much smoother speaking experience, again experiencing some quirks with different commands being issued. Example - Looking at npc surrendered & saying "move here" with the restrain command being given. This is one instance of myself experimenting with what would happen if i said a different command with what was available in the command list. I also experienced some quirks here with commands being seemingly correctly heard & executed but nothing happening, which is then usually followed by myself issuing a fall in command to "reset" them so the command will be executed. I think this is due to me issuing a command when they are currently in the middle of something, or they are temporarily physically blocked from following the command.
https://www.youtube.com/watch?v=o4niN0lOiVg&t=211s overall, I'd say this model is the most error free & most quirk free experience. In this video u can see a clear example of telling the team to arrest the npc but the move command being given. Also, a very clear example of giving a command through a doorway & the stackup command being given, sometimes this results in breach & clear.
I had a suspicion the arrest/restrain command wasn't being issued because I wasn't exactly moused over the exact point of where the restrain command can be given & indeed it is a "mousing over the exact point required for restrain command" issue. When moused over the correct point for restrain to be activated it becomes top of the menu. instead of the door commands being at the top. There is currently no "sub-menu" to navigate to the restrain command if the door command menu is at the top. I feel this is in general a Ready or Not issue overall. I imagine people who play Ready or Not without a speech mod experience the same frustrations.
I thought I was going crazy, thank you, this explains quite a lot. I only ran into this issue when playing Ides of March (so far). If you've tested it, or willing to test it, can you confirm what the extend is of it changing the team selection, what commands it affects outside of just breach and clear?
I will get onto this, provide video & screenshots if possible. I will find a spot on idles of march that can give the error & then attempt it on other maps. I will test this with the other models also.
@jwebmeister
test_model_output_overall.txt test_model_output_tokens.txt test_model_output_utterances.txt
test_model_output_overall.txt test_model_output_tokens.txt test_model_output_utterances.txt
test_model_output_overall.txt test_model_output_tokens.txt test_model_output_utterances.txt
If you would like, for further context i can edit the names of each audio file & take a screenshot so you have context for what the commands were/are in order. This way i can communicate what the audio files said vs what the test spits out. Is there any other results or things i can give data wise im unaware of?
This took me a while to get to this point. I decided it was easier for myself to make clean audio from the start, no muckups, no mistakes, no verbal garbage. Attempting to have no noise be picked up or no accidental freeze or yell. This is harder than i thought haha but i got there.
speaking "gold" sometimes will result in the command halt being given. Even when using the B-LM.
I have discovered the crux of the issue with the quirk relating to commanding one team to do something but the other team does it instead. This was tested with the B-LM to make sure it was indeed a RoN issue.
https://www.youtube.com/watch?v=Yxb3NznJFi4
https://www.youtube.com/watch?v=WNpaVtaM72M
It seems that when red or blue is stacked up that team "takes ownership" of the door if that makes sense? So when looking at said door "claimed" by red or blue the team currently stacked will be the team that follows the command even if you specified the other team to do the command.
If gold team is stacked up red or blue can be told to breach & the other team will back off.
I believe this quirk to be limited to doors/doorways & hallways where a "hidden door" in the middle of a hallway exists like on Idles of march.
https://www.youtube.com/watch?v=yvEQ_PVDoP0 I have found out that if you tell red or blue team to breach & clear that team will take ownership of the door/doorway/hallway until that team has finished clearing the room. With some areas being quite large. To someone who isn't aware of this it can come across as WTF!??? Which may contribute to the issues of unexplained red/blue designation of tasks when looking in the direction of said door/doorway/hallway with "hidden door". At this point I'd say there is a need for a mod to be created to eliminate this "feature" entirely.
Maybe a mod that makes all commands available regardless of where your looking, what team is doing what, in one big "command-tree"? that always stays the same. Hypothetically I could see this maybe making Tacspeak usage & commands potentially quirk free?
I've looked into how to go about setting up the speech training stuff to add to my own tacspeak. Wow....it's alot.
@jwebmeister
My posts & finding/results have been rather sporadic & all over the place. Apologies for that, I know it probably wasn't too helpful for proper data.
You could consider my posts here a gradual journey of myself discovering & learning as I go.
@madmaximus101 thanks for testing bud. You’ve done infinitely more than anyone else!
I wrote a longer comment but lost it due to router / ISP shenanigans so I’m just going to dot point it below.
were the txt files i provided of the 3 models helpfull in someway? or did i provide the data in the wrong manner.
were the txt files i provided of the 3 models helpfull in someway? or did i provide the data in the wrong manner.
@madmaximus101 They absolutely were, thank you.
The thing I noted was that there were only 33 commands, I average ~30-40 per mission, and that the medium and large lm models had 0% command errors, likely indicating only a single mission was run using just one of the models. I realise now I really need at least one mission run using the experimental model, and one mission run using the base model, and the tests run on the combined retained data. If that’s what you already did, my apologies, just want to confirm that’s the overall results.
Other than that, there’s also “things of note” that aren’t covered by the automated tests that you only get from play-testing. I want to make sure I’ve captured everything you’ve noted, make the job easier for myself while I review and re-review everything you’ve already noted, and give you an opportunity to add anything else that you might recall or anything you want to highlight.
your correct i did run one mission with one model and ran the data through the test thing. 1 mission run with each model. got it.
From what i remember of earlier posts. Go through and delete any data referencing noise sink. Aswell as any commands for yell or freeze? Along with their associated audio files? - I very rarely use the yell or freeze command anyway. If there are any misrecognised commands with audio correct them in the txt files.
I have upped my mic gain in-game from 100% to 110% to test if my misrecognised commands are volume related if at all.
Go through and delete any data referencing noise sink. Aswell as any commands for yell or freeze? Along with their associated audio files? - I very rarely use the yell or freeze command anyway.
You shouldn't need to do this manually. "YellFreeze" and "NoiseSink" should already be excluded if you included in user_settings.py
the setting retain_approval_func
and set it to my_retain_func
.
Otherwise yes, delete YellFreeze and NoiseSink entries in retain.tsv + audio if they are being retained, there's powershell scripts available to help do this.
If there are any misrecognised commands with audio correct them in the txt files.
Yes please. Both the text and the rule in retain.tsv
I have upped my mic gain in-game from 100% to 110% to test if my misrecognised commands are volume related if at all.
In-game mic settings should have zero effect on Tacspeak. Windows Sound Settings and your physical mic gain (or interface if you use one) might affect things if it's near in-audible (or way too loud), but shouldn't if it's within normal range.
@madmaximus101 if you’re willing / able to test, can you try playtesting a mission, start every spoken command with the correct team colour, listen back to the retained audio, see if the colour gets cut from audio? e.g. you say “blue team mirror the door”, but the retained audio is “team mirror the door”.
I’m not sure if this cut audio issue is a user issue (I pressed the listen_key too late) or a code issue, but I need further testing done and my machine is locked down at the moment.
Random side note: in testing I thought the model picked up silence as “blue” but listening back I could clearly hear “blue” spoken faintly, even though I was 99% confident I said nothing. I think I’m going crazy.
In-game mic settings should have zero effect on Tacspeak. Windows Sound Settings and your physical mic gain (or interface if you use one) might affect things if it's near in-audible (or way too loud), but shouldn't if it's within normal range.
actually... i do have one. The Epos gaming app - for my sennheiser gsp 670's. I do have alot of minor static and low level background noise. Wonder if the noise cancellation features or mic enhancement features of the app will improve things. I will test the blue/red cut off audio thing aswell.
Will test and get back to you.
@jwebmeister holy S**T mate your not going to believe this....it worked...very well. Just to be double sure. I will re-download the experimental version...juuuust incase.
The Inconsistencies with designating blue or red with mirror or wedge are still present. But very much reduced with my refined mic settings.
In this pic I've highlighted the audio file & the retain file reference. In the audio file, right at the very very beginning, right just before i talk there is some...I Dunno how to put it. Very minor static, very very minor distortion, almost like white noise, but in the background. I don't know how else to put it. Gold team was the current team & i said "mirror the door".
In this highlighted example there was no static or distortion/white noise. I also spoke slightly louder - not by much though.
But hey - overall was a much more improved experience! I will test with full noise cancelling & see what happens.
@jwebmeister
Very much improved experience. https://www.youtube.com/watch?v=pVZH5h6mr5s
Pending results from 100% noise cancelling i may upload a second video and edit comment showing its results as well.
suggestion - is there a mirror command/wedge command wierdness due to...the...type? of door? does the wedge/mirror spoken verbally have anything to do not specifying the type of door in the command? or the command auto assumes a type of door? hence the wierdness? This red/blue wierdness is less common with the trap command. Just spitballing here. What im thinking probably isn't a thing if you're not having those issues.
I will test with full noise cancelling & see what happens.
@madmaximus101 yep, please let me know how it goes. If it’s a significant improvement I’ll be surprised, but if so, it narrows down what I need to refine in the training data. There is some noisy data in the training dataset, but it’s not the whole dataset copied and perturbed like it is for speed and volume. Again, I’ll be shocked if it makes a significant difference.
suggestion - is there a mirror command/wedge command wierdness due to...the...type? of door? does the wedge/trap/mirror command being spoken verbally have anything to do not specifing the type of door? if indeed the door is different? Probably not a thing if your not having those issues.
I don’t know what specifically you mean. For it to select blue vs red? If the Tacspeak console says current team, or the correct spoken team, then it’s not an issue with the model. In general, if the Tacspeak console prints the right command, it’s not the models fault.
I don’t know what specifically you mean. For it to select blue vs red? If the Tacspeak console says current team, or the correct spoken team, then it’s not an issue with the model. In general, if the Tacspeak console prints the right command, it’s not the models fault.
nvm was thinking maybe different types of doors were named/coded a particular doortype. dont think thats the case - my bad.
Post test results + useful remarks here, ideally of both:
, using the same test data, and using the default Ready or Not grammar module.
Useful remarks include:
Important instructions:
retain.tsv
with the correct rules + text, see example workflow near the end of these instructions./scripts/copy_retain_item_cmds_only.ps1
that can be used in PowerShell to copy only "normal commands" out of./retain/
and into./cleanaudio_cmds/
_readyornot.py
grammar module, or very minor modifications, i.e. no new words../tacspeak.exe --test_model './cleanaudio_cmds/retain.tsv' './kaldi_model/' './kaldi_model/lexicon.txt' 4
./scripts/
folder related to cleaning up the retain.tsv and related .wav files.retain.tsv
and go through each line, reviewing the rule and text./retain/
folder in VLC media player on single file loop, pressing 'N' to move to next .wav as I read through each line of retain.tsvretain.tsv
to align with the audio.retain.tsv
, then when I'm done reviewing I run thelist_wav_missing_from_retain_tsv.ps1
first to make sure I'm deleting the right files, then rundelete_wav_missing_from_retain_tsv.ps1
script (option A is preferred, but hey we're all busy and life is too short to spend cleaning all the data).retain.tsv
, then when I'm done reviewing I run thelist_wav_missing_from_retain_tsv.ps1
first to make sure I'm deleting the right files, then rundelete_wav_missing_from_retain_tsv.ps1
script.Example report:
"listen_key_toggle":-1
, usingUSE_NOISE_SINK = True
; also picked up in base model but not as often._readyornot.py
without any modifications('./kaldi_model/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=1 D=0 I=0') ('./kaldi_model/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4}) ('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'WER', 'Overall -> 5.00 %+/- 9.55 %N=20 C=19 S=0 D=1 I=0') ('./kaldi_model_base/', './retain/retain.tsv', 'Command', 'CMDERR', {'cmd_not_correct_output': 0, 'cmd_not_correct_rule': 0, 'cmd_not_correct_options': 0, 'cmd_not_recog_output': 0, 'cmd_not_recog_input': 0, 'cmds': 4})