t4ngo / dragonfly

ARCHIVED! - Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS) and Windows Speech Recognition (WSR)
GNU Lesser General Public License v3.0
364 stars 82 forks source link

There should be a compound/mapping rule syntax for repetition #15

Open jgarvin opened 9 years ago

jgarvin commented 9 years ago

AFAICT there is no way to specify repetitions within the mapping rule syntax which makes dealing with situations where you need at least one word to be specified of several possible very verbose. Basically I want the '+' operator from regular expressions, or more generally the ability to tersely specify repetitions in the grammar just like you can currently in actions, e.g.:

"hello (world | friend)+"

Which would match "hello world" and "hello friend" but not just "hello". Or more general:

"hello (world | friend){1,2}"

Which would match "hello world", "hello world world", "hello friend", "hello friend friend", "hello world friend" and "hello friend world", but not "hello" and not "hello world world world".

synkarius commented 9 years ago

That's an interesting idea. Could you give an example of a use case?

jgarvin commented 9 years ago

Sure. I dynamically create a grammar for switching windows based on their title. But I want it to be a substring match, because sometimes window titles are quite wordy, e.g. if you have a document or website open in an app often the filename will be part of the window title. So I generate a lot of rules that look like "win [firefox] [github] [issue] [15]". The idea being that saying "win" followed by any of those words will select firefox provided firefox is viewing this page. And so I might have ten of these rules, because I have that many windows, but it never makes sense to just say "win" by itself, yet dragonfly will accept this because all the other words are optional.

Really what I think I want is something more complex, where once you say one of the optional words it can't be used again, which you can do now but it's very verbose. To see what I mean imagine you want to implement a command "press" that will let you tell the computer to press any key, and you want it to work with control/alt/shift so you can say things like "press control shift j". Well right now you might write something like "[(control | alt | shift)] [(control | alt | shift)] [(control | alt | shift)]" because any combination of the modifiers might be possible, but written this way dragon will look for nonsense like "control control control" which is never what you want.

synkarius commented 9 years ago

I can definitely see how that would be useful. Generating a Choice or two along with the rule seems like it would partially solve your problem...

"win \<choice1> [\<choice2>]": Function(somefunction, extra={"choice1", "choice2"}),

... but that still doesn't fully address what's you're asking for here.

t4ngo commented 9 years ago

If it were possible to let later parts of a rule adapt based on words recognized earlier in that same rule during a single recognition, then a rule could be constructed that isn't complex but does have the behavior Joseph is looking for (i.e. not allowing one option to be recognized multiple times).

Alas, that's not possible as far as I know. Instead rules must be constructed to explicitly give all possible recognitions. (Unless of course you allow free-form dictation and do fuzzy window title matching...)

Three possible constructions that might be worth investigating, in order of increasing complexity:

  1. Fixed ordering, start at first word -- The words must be spoken in the given order, and the speaker must start with the first word followed by zero or more subsequent words
    • word1 [word2 [word3]]
  2. Fixed ordering, start at any word -- The words must be spoken in the given order, but the speaker may start at any word followed by zero or more subsequent words in the given order:
    • word1 [word2 [word3]]
    • word2 [word3]
    • word3
  3. Any ordering -- One or more of the words must be spoken in any order:
    • word1 [word2 [word3] | word3 [word2]]
    • word2 [word1 [word3] | word3 [word1]]
    • word3 [word2 [word1] | word1 [word2]]

Note that the complexity of the last option increases very quickly with the number of words, so it is likely that the speech recognition engine will refuse to load it or performance will degrade sooner than you'd like.

@jgarvin: Are any of the above options applicable for your use case?

jgarvin commented 9 years ago

What I've done for the moment is have one rule per window of the form:

[Word1] [Word2] [Word3]

So saying any 1 word in the title works, and you can say more than one to disambiguate, allowing skipping words, but the order between the words must be preserved if you use multiple. The action I map to handles tie breaking through several means (e.g. if the best match is the window you already have selected, select the second best instead).

It works well enough. I was going for fuzzy behavior like the emacs ido package which is forgiving about ordering, and since I'd already run into the key press example I figured it might be a trend. In retrospect just remembering order and having:

[Control] [Alt] [Shift]

Is fine, since that's the order I'd always naturally say it anyway. So while I think it would still be a nice to have for implementing fuzzy matching actions like my window example, I can do without. On Nov 7, 2014 6:51 PM, "t4ngo" notifications@github.com wrote:

If it were possible to let later parts of a rule adapt based on words recognized earlier in that same rule during a single recognition, then a rule could be constructed that isn't complex but does have the behavior Joseph is looking for (i.e. not allowing one option to be recognized multiple times).

Alas, that's not possible as far as I know. Instead rules must be constructed to explicitly give all possible recognitions. (Unless of course you allow free-form dictation and do fuzzy window title matching...)

Three possible constructions that might be worth investigating, in order of increasing complexity:

1.

Fixed ordering, start at first word -- The words must be spoken in the given order, and the speaker must start with the first word followed by zero or more subsequent words

  • word1 [word2 [word3]] 2.

    Fixed ordering, start at any word -- The words must be spoken in the given order, but the speaker may start at any word followed by zero or more subsequent words in the given order:

  • word1 [word2 [word3]]

    • word2 [word3]
    • word3 3.

    Any ordering -- One or more of the words must be spoken in any order:

  • word1 [word2 [word3] | word3 [word2]]
    • word2 [word1 [word3] | word3 [word1]]
    • word3 [word2 [word1] | word1 [word2]]

Note that the complexity of the last option increases very quickly with the number of words, so it is likely that the speech recognition engine will refuse to load it or performance will degrade sooner than you'd like.

@jgarvin https://github.com/jgarvin: Are any of the above options applicable for your use case?

— Reply to this email directly or view it on GitHub https://github.com/t4ngo/dragonfly/issues/15#issuecomment-62232687.

jgarvin commented 9 years ago

I had an idea for being able to express "AtLeastOneOf". So if you wanted to recognize when any subset (preserving order) of "foo bar buzz" is said, AtLeastOneOf(["foo", "bar", "buzz"]) would mean firing on "foo", "bar", "buzz", "foo bar", "foo buzz", "bar buzz", and "foo bar buzz". I figured I could have a rule "[foo] [bar] [buzz]" which almost works, except that it accepts you saying nothing, but then wrapping it in a Repetition with a min=1, max=1, on the guess that Dragon wouldn't match a Repetition that had no verbal content. Turns out dragonfly asserts that min < max though. I'd have to investigate more to figure out if that limitation is present in the underlying form being compiled to.