telstra / open-kilda

OpenKilda is an open-source OpenFlow controller initially designed for use in a global network with high control-plane latency and a heavy emphasis on latency-centric data path optimisation.
Apache License 2.0
78 stars 53 forks source link

Add kafka chunking for flow validation process. #5718

Closed IvanChupin closed 3 months ago

IvanChupin commented 3 months ago

We have a problem during validating some flow, there might be the following errors:

Could not validate flow: Timeout for waiting response on command CommandMessage{timestamp=1717430534700, correlation_id=47fde2e0-94fc-4d48-9f8d-67ab1d55f517 : 91bf3693-79da-49d5-a951-858cf4fa020a : akupko@mirantis.com_4d9034df-6449-4bb3-bfe5-f97c66c4213e, cookie=null, destination=null, payload=DumpRulesForFlowHsRequest(switchId=****)}",
  "error-description": "Error in SpeakerWorker"
}

Also there is an error in the logs:

org.apache.kafka.common.errors.RecordTooLargeException: The message is 1297950 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.

IvanChupin commented 3 months ago

While investigating the problem we found out that during flow validation the Floodlight modules tries to send the too big entities with the rules toward the kilda floodlight-topology via the kafka topic. As result kafka client logs the errors about too large messages. In order to address this issue we agreed to split entities before sending it via the kafka topic.