Discarding IDs from low-severity sinks loses transitions

Description

This code snippet below (part of the traverse function) does not work in an intended way. As mentioned here, a defaultdict in Python creates a new element when it is not present in the dict and is accessed. When the ID is removed from a low-severity sink and there are more states in the trace, state_list[-1] will be -1, which will be queried in the sinks dictionary (which is a defaultdict). This will lead to -1 being added to the (global) sinks (sinks_model) variable, which should not be there (otherwise states with ID -1 will sort of be sinks).

This was the issue for CCDC dataset. In the image below, there is a (still reversed) sequence where netDOS follows vulnD. Because the ID is removed from vuldD (as it is a low-severity node), the next transition is from state -1 which cannot be correct as there are no nodes in S-PDFA with ID -1 (except for one dummy node, but that's not a problem here).

The states in the AGs are essentially the same, except for this -1 ID on some states.

Proposed solution

The IDs can be stored in a separate list here (for example, transitions_list), which is used only for transitions. The original list (state_list) will have -1s will be returned.

tudelft-cda-lab / SAGE

Discarding IDs from low-severity sinks loses transitions #30

Description

Proposed solution