shnewto / bnf

Parse BNF grammar definitions
MIT License
256 stars 22 forks source link

Rethink Grammar::generate logic #70

Closed shnewto closed 1 year ago

shnewto commented 3 years ago

I'd like to be smarter about how we generate random sentences from a given grammar.

There's a danger that a valid grammar can result in an infinite loop when randomly generating, so atm we just take an guess at if we're looping in a way that looks like it's not going to stop any time soon by checking the stack and bailing before we recurse too far.

For instance the case used in the tests is <nonterm> ::= <nonterm>. We can't get to a terminal state so we're doomed to failure if we try to generate a sentence.

I'm not sure what an ideal solution is here, for one, it'd be nice to figure something that didn't have to short circuit the potential for "maybe infinite loops".

Maybe there's a way to check for impossible grammars for the generate logic first and error early if that's the case. Then... if we have the potential for a terminal state but aren't reaching it because of random chance we need to do something sensible... maybe allow a configurable max generated length and do the work to find some terminal nodes when we get there?

If can manage a max length, I like the idea of implementing a minimum length too 🤔

SKalt commented 3 years ago

What would you think of

  1. reducing a grammar to a directed graph structure

  2. where nodes are rule names and edges go from a rule to its dependents. For example: <A> ::= <B> | <C> <D> would result in a graph

    (A) --> (B)
    \
     `-> (C) --> (D)
  3. then check for cycles that can't be reached from terminals

?

I think that'd be doable using petgraph.

shnewto commented 3 years ago

Ah petgraph looks cool! I like your proposed solution too, I'd have to poke at it a bit if I were implementing, but fwiw I'd totally support a solution like this in a PR.

notDhruv commented 1 year ago

Has this question been resolved already? If not I think I have proposal for it (I was skeptical that the proposed solution would work).

This problem can be reduced to solving the emptiness problem for CFLs. I have added a solution strategy below:

  1. We mark every non-terminal that produces a sequences of terminals
  2. We then recursively mark every non-terminal that can produce a sequence of marked elements
  3. We check whether the initial non-terminal is marked

I don't think this needs any special libraries but just a poly-space recursion on the existing grammar would work well.

What do you guys think?

shnewto commented 1 year ago

@notDhruv this hasn't been solved yet 😄 I think your approach sounds great, seems like it'd definitely give us a mechanism for avoiding the infinite loop issues, and I can imagine that if we have some known-to-be-finite paths to terminals to traverse, we could work in the min - max generated sentence length too.

shnewto commented 1 year ago

addressed by #127 thanks for everyone's suggestions!