Closed silentbicycle closed 1 year ago
CI is failing with a linker issue, but only specifically with ASAN gcc builds. I can duplicate that failure mode locally but haven't figured out why it's happening yet.
The CI issue was because the last definition in the file passed to objcopy --keep-global-symbols
was getting missed by gcc at link-time, but only when using -fsanitize=address
.
Add better testing for minimisation and fix incomplete minimisation's root cause.
Fixes #404.
Add minimisation test oracle and minimise_test_case_list.c.
Testing was previously done with
fsm -t equal
, which checks that the state machine matches the same input as the expected result, but doesn't confirm that it is minimal. Also minimised the FSM using a reference implementation and verify that the resulting state count is as expected.Add src/libfsm/minimise_test_oracle.c: Inefficient but easily checked implementation of minimisation. (This does not handle DFAs with captures properly.)
tests/minimise/minimise_test_case_list.c: For a list of regex patterns, check that minimising each ends up with the same number of states as the test oracle implementation's minimised DFA.
Add an
EXPENSIVE_CHECK
afterfsm_minimise
-- if the FSM does not have captures, minimise it using the test oracle and check that the state count matches. Depending on the input size, this can potentially be quite expensive.Add
fsm_capture_has_captures
, currently an internal interface.Add
fsm_shuffle
, which uses a linear congruential pseudorandom number generator to shuffle state IDs. This is used to check that renumbering states in a DFA does not change minimisation's result.Add fuzzer harness mode for fuzzing minimisation.
The fuzzer checks whether, for an arbitrary string S, if S can be successfully converted to an NFA, whether using
fsm_shuffle
to renumber the states can lead to a minimised state count inconsistent with the result from the minimisation test oracle. Syntax errors, etc. are ignored.To use this, run with
env MODE=m
.After integrating the
edge_set_check_edges_with_EC_mapping
fix I ran this on 8 cores for about two hours without producing any new failures.Fix the bug in
edge_set
's label collection during minimisationUse EC membership, not state id, for label partitioning check.
When minimisation checked the states' edge sets for sets of labels leading to particular destinations it was collecting labels that led to the same state, but it should have instead collected labels to any states within the same EC (which is progressively refined as information propagates during minimisation). This could lead to cases where state(s) would incorrectly split out of an EC, leading to a non-minimal DFA where some states (each representing an EC set) could be combined safely and produce a smaller DFA.
A small input that exercised this bug is "ab*c", which produced a correct but non-minimal DFA with 4 states instead of 3. This was added to
tests/minimise/minimise_test_case_list.c
as a regression.Rename
edge_set_check_edges
toedge_set_check_edges_with_EC_mapping
and note that it's specific to minimisation's use case. Pass in the current EC map from minimisation and use it to determine which states should be considered equivalent to the label's first destination state.Also, update a variable name in a normally compiled-out logging statement and change
checked_labels
's length to[256/64]
to be consistent with its definition elsewhere.Other misc. changes
fsm_consolidate
's src argument should beconst
.reachable.c: Removed unused variable. (Address warning.)