Let us focus on the 13th field, duration. I just wanted to confirm: _has it been used to do data analysis (e.g., the empirical entropy calculation) rather than just specify the range of data to be selected?_ I hope the answer is NO; if so, then there is no essential bug in Data.py; we could either just comment therein to make readers know that the name duration is just a name, meaning nothing regarding the actual flow duration or _rename it as, for instance, flow_size_pkts_.
I have tested the effects of doing the latter and my conclusion is: The answer to the above question is indeed _NO_. (The crucial field seems to be start_time; deleting either end_time or duration or both will not make the detection not work.)
To make the previous configuration scripts still work, we need to replace duration with flow_size_pkts.
Let us focus on the 13th field,
duration
. I just wanted to confirm: _has it been used to do data analysis (e.g., the empirical entropy calculation) rather than just specify the range of data to be selected?_ I hope the answer is NO; if so, then there is no essential bug in Data.py; we could either just comment therein to make readers know that the nameduration
is just a name, meaning nothing regarding the actual flow duration or _rename it as, for instance,flow_size_pkts
_.I have tested the effects of doing the latter and my conclusion is: The answer to the above question is indeed _NO_. (The crucial field seems to be
start_time
; deleting eitherend_time
orduration
or both will not make the detection not work.)To make the previous configuration scripts still work, we need to replace
duration
withflow_size_pkts
.