nbp / seqrec

Reconstruct the trace of execution of a program based on a sequence of bytes produced by the same code.
MIT License
1 stars 0 forks source link

Convert the frame matching automata to an LR state machine #1

Open nbp opened 6 years ago

nbp commented 6 years ago

When learning from the stack frames in the StackClassifier, the code will generate a state machine which will identify a single frame.

  (0)  -- 0x42 --> (1) -- 0x51 --> (produce foo)
        \
          '-- foo --> (produce bar)

This is not yet a LR state machine, as each node which produce a frame does not have a successors for properly resuming the execution after. We should convert this graph into a proper LR state machine such that we would not have to let the fuzzy-glr parser implemented in reverse.js guess by inserting new stack frames starting over from the initial state.

Doing so will reduce the number of failed attempts in the fuzzy-glr parser and thus keep the most likely candidates, instead of dropping them due to the large amount of candidates caused by eagerly reducing states.

nbp commented 6 years ago

To make it clearer, we should either change the current automata to record an LR(1) state machine as we record (not infer from the graph as it would be the same issue), and include some fuzzy look-ahead while reducing states, to favor reduced states which are likely to contribute more.

Today, when we reduce we have too many states, which blew up the time and quality of the parsed tree that we want to generate.