The 95th percentile when tracing 400+ EVCs with INT, in total 12k ish flows on all switches, it was around 6 secs. This isn't an immediate major issue, but if we end up using sdntrace cp in bulk for the consistency check atomically holding a lock for that long wouldn't be great, it'll depend on if and how it'll run, but if it needs to be atomically thread safe for all EVCs being checked it can lead to overall slowness.
Current evident bottle necks (let me register here for future discussions):
1) The vast majority of the time is spent on sdntrace_cp. The lookup algorithm time complexity is roughly:
The biggest one, which is linear, is usually O(switch[table_size]), if we didn't have to support matching with bitmasks we could leverage a flow match_id indexed and perform in O(1) per switch, but we'll still need to maintain with bit mask since mef_eline uses it with vlan range and other special tags. Unless, it also had an endpoint to not trace anything with bitmasks involved.
2) flow_manager stored_flows took 1.7 secs, and looks like it was mostly latency in the API and not in the DB, that would still looking into is the response was too large and json wasn't too fast to serialize, in the future, sdntrace_cp querying flow_manager directly in its FlowController should help too.
Current evident bottle necks (let me register here for future discussions):
1) The vast majority of the time is spent on
sdntrace_cp
. The lookup algorithm time complexity is roughly:O(switches) * O(switch[table_size]) * O(switch[goto_tables])
The biggest one, which is linear, is usually
O(switch[table_size])
, if we didn't have to support matching with bitmasks we could leverage a flowmatch_id
indexed and perform inO(1)
per switch, but we'll still need to maintain with bit mask sincemef_eline
uses it with vlan range and other special tags. Unless, it also had an endpoint to not trace anything with bitmasks involved.2) flow_manager
stored_flows
took 1.7 secs, and looks like it was mostly latency in the API and not in the DB, that would still looking into is the response was too large and json wasn't too fast to serialize, in the future,sdntrace_cp
queryingflow_manager
directly in itsFlowController
should help too.