natasha-jeppu / Trace2Model

Learning Concise Models from Long Execution Traces
BSD 3-Clause "New" or "Revised" License
12 stars 1 forks source link

An wrong model generated from a simple trace #4

Open rladydpf1 opened 1 year ago

rladydpf1 commented 1 year ago

When running python3.8 Trace2Mode/learn_model.py --dfa --incr -t ./models -i traces/trace_events.txt, a buggy model was generated.

trace_events.txt:

start
(!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and (241 == rt_input.message_0) and !(241 > rt_input.message_7)): OUT_OF_INDEX
start
WITHIN_INDEX
start

And the result was

******************************************** TRACE:1
1
Total size:2

******************************************** Iteration:0
['start', '(!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and !(241 >= rt_input.message_7) and (241 == rt_input.message_0)): OUT_OF_INDEX']
[WARNING] Using non segmented

******************************************** Iteration:0
['start', '(!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and !(241 >= rt_input.message_7) and (241 == rt_input.message_0)): OUT_OF_INDEX']
[WARNING] Using non segmented
Generating model with 2 states
Finding CE
CE found :
[[2, 2]]
Running CBMC...............
No model, increasing number of states to 3
Running CBMC...............
Final Generated model
[[1, 1, 3], [3, 2, 2]]

Time taken: 0.12850236892700195

------------- Verifying: ----------------------------

All behaviors present
Final Generated model
[[1, 1, 3], [3, 2, 2]]
Number of states: 3
Number of transitions: 2

******************************************** TRACE:2
1
Total size:2

******************************************** Iteration:0
['start', 'WITHIN_INDEX']
[WARNING] Using non segmented

******************************************** Iteration:0
['start', 'WITHIN_INDEX']
[WARNING] Using non segmented
Generating model with 3 states
Finding CE
CE found :
[[2, 3], [3, 2], [3, 3]]
Running CBMC...............
Final Generated model
[[1, 1, 3], [3, 2, 2], [3, 3, 2]]

Time taken: 0.3221313953399658

------------- Verifying: ----------------------------

All behaviors present
Final Generated model
[[1, 1, 3], [3, 2, 2], [3, 3, 2]]
Number of states: 3
Number of transitions: 3
NFA states:  3

Time taken: 0.47003698348999023

Converting NFA to DFA..........
states =  3
Length of trace: 5

All behaviors present
States not visited
[]
Final Generated model
[[1, 1, 2], [2, 2, 3], [2, 3, 3]]
[[1, 'start', 2], [2, '(!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and !(241 >= rt_input.message_7) and (241 == rt_input.message_0)): OUT_OF_INDEX', 3], [2, 'WITHIN_INDEX', 3]]
Number of states: 3
Number of transitions: 3

The model was

trace_events_incr_same_dfa

I expected it to be separated into 'WITHIN_INDEX' and 'OUT_OF_INDEX' states. If this was intended, I am now wondering how to separate those states in the model.

natasha-jeppu commented 1 year ago

This is intended. Note that the automata construction algorithm generates models from only positive traces and the input traces are the only information available to the algorithm. The algorithm interprets each line in trace_events.txt as boolean events and attempts to capture the sequence of boolean events as an automaton. In the case specified here, there is no sequence in trace_events.txt that contradicts the generated model. If instead you had a trace like so,

start
(!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and (241 == rt_input.message_0) and !(241 > rt_input.message_7)): OUT_OF_INDEX
OUT_OF_INDEX
start
WITHIN_INDEX
start

then you can expect to see a separation, since in the given trace event WITHIN_INDEX is never followed by the event OUT_OF_INDEX unlike event (!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and (241 == rt_input.message_0) and !(241 > rt_input.message_7)): OUT_OF_INDEX. This contradicts having both events (!(8 > rt_input.msg_length) and (1 > rt_input.message_5) and (241 == rt_input.message_0) and !(241 > rt_input.message_7)): OUT_OF_INDEX and WITHIN_INDEX going into the same state as in the model above, and therefore you can expect to see separate states.

In general, a longer trace sequence helps capture behaviour better as longer sequences hold more sequential-behaviour information of your system.