Open V0ldek opened 1 year ago
Tagging @V0ldek for notifications
Looking into this issue. So far:
I have created a unit test with this subject in mind, though it appears some of the other tests should regress once this change is made. Available for review at https://github.com/ZwerdDPU/rsonpath/tree/minimizer: rsonpath-lib::query::automaton::minimizer::tests::minimization_deduplicates_trivial_test
.
That test is intentionally failing for now.
This is the tree the validation will check which includes the notated minimzation of combining states 3&4, with corresponding node ids offset, for the purpose of making validating the result easier:
It seems to me that the minimzation will only take place within checkpoint-created boundaries. Does this seem correct?
I intend to look further into this issue in the coming weeks. Let me know what you think.
Current checkpoint-based code is basically a hack to produce "somewhat-minimal" automata. In particular it can be proven that without wildcards the checkpoints are enough for the result to be truly minimal (Lemma 5.1. in https://v0ldek.com/masters).
The property is that once you reach a checkpoint you can forget all the other states, which means that a checkpoint will always be a unique state in any resulting DFA. So minimizing between them should be enough.
BTW, you can make your life testing this easier with https://paperman.name/semigroup/. It's a tool for automata properties, and allows you to minimize them. If you run rsonpath -c -v <query>
it will actually print the DFA in a format that can be directly accepted by the tool.
For example:
just r -c '$..a.*.*..b.*.*' -v
will output as one of the logs:
[rsonpath_lib::query::automaton] NFA: s0.a -> s0, s1;
s0.b -> s0;
s0.X -> s0;
s1.a -> s2;
s1.b -> s2;
s1.X -> s2;
s2.a -> s3;
s2.b -> s3;
s2.X -> s3;
s3.b -> s3, s4;
s3.a -> s3;
s3.X -> s3;
s4.a -> s5;
s4.b -> s5;
s4.X -> s5;
s5.a -> s6;
s5.b -> s6;
s5.X -> s6;
You can copy-paste it into semigroup to get this nice graph:
(By convention we mark the wildcard as capital X, since the tool only accepts letters)
So if you come up with an interesting case to test you can easily find the minimal automaton to verify the result.
Is your feature request related to a problem? Please describe. After #7 the compiler no longer necessarily outputs a minimal DFA. It's still a correct DFA, just not minimal.
Describe the solution you'd like The algorithm in
query::automaton::minimizer
needs to be rewritten to minimize the automaton.Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Example query that produces a non-minimal automaton:
$..a.*.*..b.*.*
. In the resulting automaton below states 3 and 4 are equivalent.