Closed shnewto closed 1 year ago
What would you think of
reducing a grammar to a directed graph structure
where nodes are rule names and edges go from a rule to its dependents. For example: <A> ::= <B> | <C> <D>
would result in a graph
(A) --> (B)
\
`-> (C) --> (D)
then check for cycles that can't be reached from terminals
?
I think that'd be doable using petgraph
.
Ah petgraph looks cool! I like your proposed solution too, I'd have to poke at it a bit if I were implementing, but fwiw I'd totally support a solution like this in a PR.
Has this question been resolved already? If not I think I have proposal for it (I was skeptical that the proposed solution would work).
This problem can be reduced to solving the emptiness problem for CFLs. I have added a solution strategy below:
I don't think this needs any special libraries but just a poly-space recursion on the existing grammar would work well.
What do you guys think?
@notDhruv this hasn't been solved yet 😄 I think your approach sounds great, seems like it'd definitely give us a mechanism for avoiding the infinite loop issues, and I can imagine that if we have some known-to-be-finite paths to terminals to traverse, we could work in the min - max generated sentence length too.
addressed by #127 thanks for everyone's suggestions!
I'd like to be smarter about how we generate random sentences from a given grammar.
There's a danger that a valid grammar can result in an infinite loop when randomly generating, so atm we just take an guess at if we're looping in a way that looks like it's not going to stop any time soon by checking the stack and bailing before we recurse too far.
For instance the case used in the tests is
<nonterm> ::= <nonterm>
. We can't get to a terminal state so we're doomed to failure if we try to generate a sentence.I'm not sure what an ideal solution is here, for one, it'd be nice to figure something that didn't have to short circuit the potential for "maybe infinite loops".
Maybe there's a way to check for impossible grammars for the generate logic first and error early if that's the case. Then... if we have the potential for a terminal state but aren't reaching it because of random chance we need to do something sensible... maybe allow a configurable max generated length and do the work to find some terminal nodes when we get there?
If can manage a max length, I like the idea of implementing a minimum length too 🤔