talonvoice / beta

Issue tracker for the private Talon Beta
10 stars 0 forks source link

parsing(?) bugs in captures #125

Closed rntz closed 3 years ago

rntz commented 3 years ago

Two (possibly related) bugs. Let's call them the "duck duck goose" and "merry christmas" bugs. First one, "duck duck goose", involves some interacting captures:

duckduckgoose.py

from talon import Module, Context
mod=Module()
ctx=Context()

@mod.capture
def ducks(m) -> str: "duck+"
@ctx.capture(rule="duck+")
def ducks(m): return "ducks"

@mod.capture
def duckgoose(m) -> str: "duck+ goose"
@ctx.capture(rule="<self.ducks> goose")
def duckgoose(m): return "duckgoose"

duckduckgoose.talon

duck test <user.ducks>: "{ducks}"
goose test <user.duckgoose>: "{duckgoose}"
test <user.ducks> <user.duckgoose>: "{ducks} {duckgoose}"

Now, try saying "test duck duck goose".

Expected result: "ducks duckgoose" Actual result:

2020-09-22 22:51:01    IO 
2020-09-22 22:51:05    IO EMIT ['test', 'duckdak', 'goose']
2020-09-22 22:51:05    IO COMPILING
2020-09-22 22:51:05    IO dfa rules built in 0.101s
2020-09-22 22:51:05    IO dfa rules linked 0.113s
2020-09-22 22:51:06    IO minimize + cfg in 0.174s
2020-09-22 22:51:06    IO DECODING
detecting in viterbi toks: #################_test###_duckdak###_gooseoseose####
791.166 #################_test###_duck#_duck##_gooseoseose####
  result: test duck duck goose

2020-09-22 22:51:06    IO DECODED ['test', 'duck', 'duck', 'goose']
2020-09-22 22:51:06 ERROR cb error topic="phrase" cb=<bound method SpeechSystem.engine_event of <talon.scripting.speech_system.SpeechSystem object at 0x7fb91d52ced0>>
   24:       lib/python3.7/threading.py:890| 
   23:       lib/python3.7/threading.py:926| 
   22:       lib/python3.7/threading.py:870| 
   21:                    talon/cron.py:112| 
   20: ------------------------------------# cron thread
   19:                    talon/cron.py:77 | 
   18:          talon/scripting/rctx.py:200| 
   17: ------------------------------------# 'cron' main:<lambda>()
   16:                     talon/vad.py:16 | 
   15:             talon/engines/w2l.py:745| 
   14:      talon/scripting/dispatch.py:98 | 
   13:      talon/scripting/dispatch.py:133| 
   12:      talon/scripting/dispatch.py:124| 
   11:          talon/scripting/rctx.py:200| 
   10: ------------------------------------# 'phrase' user.engines:_redispatch()
    9: talon/scripting/speech_system.py:42 | 
    8:      talon/scripting/dispatch.py:98 | 
    7:      talon/scripting/dispatch.py:133| 
    6:      talon/scripting/dispatch.py:124| 
    5:          talon/scripting/rctx.py:202| 
    4: ------------------------------------# 'phrase' user.engines:engine_event()
    3: ------------------------------------# stack splice
    2:          talon/scripting/rctx.py:200| 
    1: talon/scripting/speech_system.py:300| 
talon.engines.EngineError: failed to parse phrase: ['test', 'duck', 'duck', 'goose']
2020-09-22 22:51:06    IO [audio]=2430.000ms  [emit]=510.499ms (0.21x)  [decode]=2.124ms (0.00x)  [total]=512.623ms (0.21x)

Ok, now for the merry christmas bug.

merry.py

from talon import Module, Context
mod=Module()
ctx=Context()

mod.list("merry", desc="merry")
ctx.lists["self.merry"] = { "merry": "merry" }

@mod.capture
def merries(m) -> str: "merry+"
@ctx.capture(rule="{self.merry}+")
def merries(m): return "-".join(m.merry_list)

merry.talon

# This fails with an AttributeError
<user.merries> merry* christmas: "MERRY CHRISTMAS"

# This succeeds.
#merry* <user.merries> christmas: "MERRY CHRISTMAS"

Now try saying "merry christmas".

Expected result: "MERRY CHRISTMAS" Actual result:

2020-09-22 22:58:25    IO EMIT ['merry', 'christmas']
2020-09-22 22:58:25    IO DECODING
detecting in viterbi toks: ###########_merryry_christmas###########
611.77 ###########_merryry_christmas###########
  result: merry christmas

2020-09-22 22:58:25    IO DECODED ['merry', 'christmas']
2020-09-22 22:58:25 ERROR     2: talon/grammar/vm.py:87| 
    1: talon/grammar/vm.py:82| 
KeyError: 'merry_list'

[The below error was raised while handling the above exception(s)]
2020-09-22 22:58:25 ERROR cb error topic="phrase" cb=<bound method SpeechSystem.engine_event of <talon.scripting.speech_system.SpeechSystem object at 0x7fb91d52ced0>>
   33:       lib/python3.7/threading.py:890| 
   32:       lib/python3.7/threading.py:926| 
   31:       lib/python3.7/threading.py:870| 
   30:                    talon/cron.py:112| 
   29: ------------------------------------# cron thread
   28:                    talon/cron.py:77 | 
   27:          talon/scripting/rctx.py:200| 
   26: ------------------------------------# 'cron' main:<lambda>()
   25:                     talon/vad.py:16 | 
   24:             talon/engines/w2l.py:745| 
   23:      talon/scripting/dispatch.py:98 | 
   22:      talon/scripting/dispatch.py:133| 
   21:      talon/scripting/dispatch.py:124| 
   20:          talon/scripting/rctx.py:200| 
   19: ------------------------------------# 'phrase' user.engines:_redispatch()
   18: talon/scripting/speech_system.py:42 | 
   17:      talon/scripting/dispatch.py:98 | 
   16:      talon/scripting/dispatch.py:133| 
   15:      talon/scripting/dispatch.py:124| 
   14:          talon/scripting/rctx.py:202| 
   13: ------------------------------------# 'phrase' user.engines:engine_event()
   12: ------------------------------------# stack splice
   11:          talon/scripting/rctx.py:200| 
   10: talon/scripting/speech_system.py:301| 
    9:              talon/grammar/vm.py:174| 
    8:              talon/grammar/vm.py:137| 
    7: talon/scripting/speech_system.py:318| 
    6:              talon/grammar/vm.py:174| 
    5:              talon/grammar/vm.py:137| 
    4: talon/scripting/speech_system.py:322| 
    3:         talon/scripting/types.py:327| 
    2:    user/mine/regression/merry.py:11 | def merries(m): return "-".join(m.merr..
    1:              talon/grammar/vm.py:89 | 
AttributeError: merry_list
2020-09-22 22:58:25    IO [audio]=1710.000ms  [emit]=157.272ms (0.09x)  [decode]=3.478ms (0.00x)  [total]=160.750ms (0.09x)

While these bugs are a bit arcane, they are not contrived. I ran into the merry christmas bug while writing actual talon code to do with modifier keys. The code I was writing was incorrect, but I didn't realize this because it triggered the merry christmas bug. I discovered the duck duck goose bug while trying to minimize the merry christmas bug.

lunixbochs commented 3 years ago

Re: duck duck goose.

I think the optimizer is rejecting the duck duck goose case on purpose, for good reason, to prevent exponential parsing time

<duck>: duck+ compiles to a loop around the word duck

  user.ducks.0:
  0 WORD 'duck'
  1 FORK (0, -2)
  2 RETURN

  user.duckgoose.0:
  0 CALL <user.ducks>
  1 WORD 'goose'
  2 RETURN

To prevent exponential parsing cases, when a loop jumps backwards and forwards at the same time, the forward path is not allowed to visit the backwards jump target without advancing a word

(<duck> <duck>) is two basic loops around the word duck, which means the second duck will never contain any words, because it is prevented from jumping to the word duck, as it is a descendent of the first duck's dual forward/backward jump. I believe this is correct - the only solution I can imagine is to allow backtracking one word in the second loop once the first loop terminates unsuccessfully, but that's kind of complicated.

There's no "correct" distribution of words between the two ducks anyway. I think the easy answer is you need to design your rules to not put two of the same basic repetition captures in a row without any bridge words.

lunixbochs commented 3 years ago

Fixed the merry christmas bug in v0.1.2 - when recently optimizing list parsing for wav2letter, I introduced a regression where in some cases a list could consume 0 words but not fail that parse path. That's fixed now.

rntz commented 3 years ago

Thanks! I can confirm this fixes the merry christmas bug for me. I am less concerned with the duck duck goose case, since I didn't run into it while writing real code, and as you point out it involves putting two repetition captures in a row, which is not a very sensible thing to do.

My only (mild) concern is that if one did accidentally write some code that looked like duck duck goose without realizing it, it might be hard to debug. (This is what happened with merry christmas; there was some indirection through captures that made it harder to notice.) If the error message said something about adjacent repetitions of the same capture/list, that would make it much easier to figure out the problem with my code. Is it easy to tell if this case is being triggered and change the error message?

No worries if not, and thanks for fixing this so quickly!

lunixbochs commented 3 years ago

No it’s not trivial to detect this case