telstra / open-kilda

OpenKilda is an open-source OpenFlow controller initially designed for use in a global network with high control-plane latency and a heavy emphasis on latency-centric data path optimisation.
Apache License 2.0
78 stars 53 forks source link

Timestamp of packets on Novi (Latency measurement) #580

Closed pzakatov closed 5 years ago

pzakatov commented 6 years ago

UseCase

As a user, I want to see ISL roundtrip latency measured from a single switch and stored in DB and OTSDB

Logic

1 Send a discovery as a packet_out to switch A 2 Switch A put a timestamp t1 to the discovery packet and send it to switch B 3 Switch B receives the discovery packet and does 2 things: send it to the controller, modify packet (UDP port) and send it back to switch A (via group tables) 4 Switch A receives his own message, add a timestamp t2 to it and send it to a controller. 5 Controller use the discovery packet from Switch B for ISL discovery purposes 6 Controller use the discovery packet from Switch A for a roundtrip latency measurement purposes (t2-t1)

Details

Accurately measuring latency within an OpenFlow network has caused challenges as the OpenFlow protocol itself does not have a standardised way to measure latency. A common technique is to send a packet which has the timestamp embedded in the packet of when the controller sent the packet. When the receiving switch sends the packet back to the controller the receive time by the controller is known and thus the latency can be calculated.

That mechanism for latency calculation has a challenge in that the latency between the Controller and the switch is not easy to account for. A common practice is to add the average latency between Controller and Switch to the T0 (time PACKET_OUT was sent by the Controller) and subtract the latency from T1 (time PACKET_OUT was received by a controller). In a large network with high latencies and jitter, those numbers can become meaningless quickly. Couple that with the processing time of the PACKET_OUT/PACKET_IN messages within the switch and latency is not very accurate.

To combat that issue Noviflow has added an extension allowing the time a packet is received and transmitted by a switch to be appended to the packet. That timestamp is based on a software mechanism and accuracy is likely to be +/- 1-2ms.

Noviflow has also added an extension which allows the switch to append a hardware-based timestamp that is extremely accurate to all outgoing packets (matching a specific match rule). However, as the timestamp is appended to the packet after the ethernet frame the packet is not a valid Ethernet packet and some equipment will drop the packet as being malformed.

Within OpenKilda the ISL discovery mechanism sends an LLDP packet encapsulated in a UDP packet (to clear any intermediate carriers which might be hijacking LLDP packets). One of the TLV's in the LLDP packet represents the timestamp when the PACKET_OUT message is created by the controller (plus latency between controller and switch). Two additional TLV's have been added to the LLDP to represent the sending switch timestamp and the receiving switch timestamp, TLV 0x04 and 0x05 respectively.

On the PACKET_OUT message, an action will be created to copy the timestamp into the correct position in the PACKET_OUT payload (TLV 0x04). On the receiving switch, the matching rule will send it back to originating switch where an action added to copy the receiving timestamp into TLV 0x05 before sending to the controller.

When the controller receives a discovery packet it will look to see if TLV 0x04 and/or 0x05 is set, if it is then those values will be used for calculating the latency else the original message based on the Controller timestamps will be used.

Verification Packet structure

 0                   1                   2                   3                   4              
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                    Destination Mac Address                                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                       Source Mac Address                                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           EtherType           |Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|     Fragment Offset     |  Time to Live |    Protocol   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Header Checksum        |                       Source IP Address                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Destination IP Address                    |          Source Port          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Destination Port       |             Length            |            Checksum           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|ChasisID Type|      length     |                           Chassis Id                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                               | Port Id Type|      length     |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Port             |   TTL Type  |      length     |              TTL              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Optional Type|      length     |       Organizationally Unique Identifier      | Timestamp type|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                          Timestamp T0                                         |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |Optional Type|      length     |Organizationally Unique Identi.|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               | Timestamp type|                          Timestamp T1                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |Optional Type|      length     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Organizationally Unique Identifier      | SwitchId type |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                                          Datapath ID                                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Optional Type|      length     |       Organizationally Unique Identifier      | Timestamp type|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                      Floodlight Timestamp                                     |
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |Optional Type|      length     |Organizationally Unique Identi.|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               |  Ordinal type |                            Ordinal                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Optional Type|      length     |       Organizationally Unique Identifier      |   Sigh type   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                                                               |
+                                                                                               +
|                                             Token                                             |
+                                                                     +-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                                     |       End of LLDP       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     |
+-+-+-+

SEQUENCE DIAGRAM COMMING SOON... WATCH THIS SPACE :)

new OTSDB metric should be isl.rtt in nanoseconds

jonvestal commented 6 years ago

Modifications to Loxigen to support

  1. RXTimestamps
  2. TXTimestamps
  3. Copy_Field

Have been completed, tested and a PR submitted.

pzakatov commented 6 years ago

Related PR: #5