netfishers-onl / BLT

BLT The BGP Linkstate Topographer
24 stars 8 forks source link

BGP: keepalive interval bigger than hold time #4

Closed lukastribus closed 8 years ago

lukastribus commented 8 years ago

BGP Hold Time in BLT is 30 seconds, but keepalive message are only sent every 60 seconds. This causes the BGP session to keep going down.

As per RFC4271 section 4.4, the keepalive interval should be one third of the Hold Time interval.

If the keepalive interval is bigger than hold time, the BGP session cannot possibly be stable.

It would probably best to bump the hold time to 180 seconds.

AlainMoretti commented 8 years ago

Hi Lukas and thank you for your interest to BLT. Regarding this issue, if you configure your BGP router with keepalive/holdtime 90/180, as soon as it speaks with BLT a negociation occurs, and you end up with a 10/30 working peer.

Real life example :

RP/0/0/CPU0:iosxrv-1#sho bgp neighbor
Thu May 12 16:24:32.593 UTC

BGP neighbor is 192.168.255.2 ==> this is BLT
 Remote AS 1, local AS 1, internal link
 Remote router ID 0.0.0.1
  BGP state = Established, up for 01:13:35
  Last read 00:00:05, Last read before reset 01:14:10
  Hold time is 30, keepalive interval is 10 seconds
  Configured hold time: 180, keepalive: 90, min acceptable hold time: 3

Could you elaborate the situation ending up to a dead peer ? I think I don't get it...

lukastribus commented 8 years ago

Hi Alain,

thank you for this project, I hope you don't mind me submitting a bunch of issues :)

BGP keeps going down to my XR box, with default BGP timers - or 90/180 as you proposed, because BLT doesn't send the second keepalive in time:

RP/0/RSP0/CPU0:PE-9#show run router bgp 65535 neighbor 10.7.250.6
Fri May 13 11:44:33.798 CEST
router bgp 65535
 neighbor 10.7.250.6
  remote-as 65535
  timers 90 180
  update-source Loopback0
  session-open-mode passive-only
  address-family link-state link-state
  !
 !
!

RP/0/RSP0/CPU0:PE-9#show bgp neighbor 10.7.250.6 | inc "old t"
Fri May 13 11:44:45.055 CEST
  Hold time is 30, keepalive interval is 10 seconds
  Configured hold time: 180, keepalive: 90, min acceptable hold time: 3
  Last reset 00:00:46, due to BGP Notification sent: hold time expired
  Error Code: hold time expired
RP/0/RSP0/CPU0:PE-9#
RP/0/RSP0/CPU0:PE-9#debug bgp keepalive 10.7.250.6 in
Fri May 13 11:46:07.030 CEST
RP/0/RSP0/CPU0:PE-9#
RP/0/RSP0/CPU0:May 13 11:46:21.176 : bgp[1058]: [default-iord]: KEEPALIVE received from 10.7.250.6
RP/0/RSP0/CPU0:May 13 11:46:21.176 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Up (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:46:51.177 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Down - BGP Notification sent, hold time expired (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:47:44.409 : bgp[1058]: [default-iord]: KEEPALIVE received from 10.7.250.6
RP/0/RSP0/CPU0:May 13 11:47:44.409 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Up (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:48:14.410 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Down - BGP Notification sent, hold time expired (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:48:45.249 : bgp[1058]: [default-iord]: KEEPALIVE received from 10.7.250.6
RP/0/RSP0/CPU0:May 13 11:48:45.249 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Up (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:49:01.258 : bgp[1058]: [default-iord]: KEEPALIVE received from 10.7.250.6
RP/0/RSP0/CPU0:May 13 11:49:31.259 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Down - BGP Notification sent, hold time expired (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:50:05.152 : bgp[1058]: [default-iord]: KEEPALIVE received from 10.7.250.6
RP/0/RSP0/CPU0:May 13 11:50:05.152 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Up (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:May 13 11:50:35.153 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Down - BGP Notification sent, hold time expired (VRF: default) (AS: 65535)

Some notes: BGP holdtime is negotiated, BGP keepalive interval is NOT. Every BGP implementation has to adjust BGP keepalive interval to be less than the holdtime and the RFC suggestion is for it to be a third of the holdtime.

The BLT behavior is that the keepalive interval is bigger than the holdtime (in this case the second keepalive is not transmitted), which is why the BGP session keeps dropping.

I can confirm this by tcpdumping on the VM where BLT runs.

Not sure about the difference between your setup and mine.

AlainMoretti commented 8 years ago

Strange because actual hold time is the minimum between HoldTime proposed by both peers.

/**
* get the negotiated hold time. This is the minimum of the locally configured hold time and the
* hold time value received from the remote peer in the initial open packet. 
 * It is assured that the negotiated hold time cannot be less than 3 seconds as specified by RFC4271. 
 *  
 * @return
 */
int getNegotiatedHoldTime() {
    int negotiatedHoldTime = Math.min(peerConfiguration.getHoldTime(), peerProposedHoldTime);
    if(negotiatedHoldTime < 3)
        negotiatedHoldTime = 3;
    return negotiatedHoldTime;
}

And keepalive becomes 1/3 of the negotiated HoldTime, whic complies to the RFC :

/**
 * get the keepalive interval which is 1/3 of the negotiated hold time. It is assured that the interval
 * cannot be less than 1 second as specified by RFC4271
 * 
 * @return
 */
private int getSendKeepaliveTime() {
    return Math.max(getNegotiatedHoldTime() / 3, 1);
}

Those excerpts come from BLT code, which was borrowed from an implementation written by Rainer Bienek. I work with an XRrouter just like you, and you can see in my previous post that, even though I have a local config at 90/180, I end up with 10/30, and BLT does send a keepalive every ten seconds :

22:26:50,476 INFO  [QuartzScheduler_Worker-2] BGPv4Codec: writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@60676411[type=4]
2016-05-13 22:26:50,508 INFO  [New I/O worker #1] BGPv4Codec: received packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@b0c846a[type=4]
22:27:00,478 INFO  [QuartzScheduler_Worker-5] BGPv4Codec: writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@526febe8[type=4]
2016-05-13 22:27:00,528 INFO  [New I/O worker #1] BGPv4Codec: received packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@a3a83ae[type=4]
22:27:10,481 INFO  [QuartzScheduler_Worker-9] BGPv4Codec: writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@1e60c053[type=4]
2016-05-13 22:27:10,546 INFO  [New I/O worker #1] BGPv4Codec: received packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@d00a20d[type=4]

and session remains very stable :

RP/0/0/CPU0:xrv1#sho bgp link-state link-state summary
Fri May 13 21:35:37.960 UTC
BGP router identifier 10.0.0.29, local AS number 1
BGP generic scan interval 60 secs
Non-stop routing is enabled
BGP table state: Active
Table ID: 0x0   RD version: 54
BGP main routing table version 54
BGP NSR Initial initsync version 54 (Reached)
BGP NSR/ISSU Sync-Group versions 0/0
BGP scan interval 60 secs

BGP is operating in STANDALONE mode.

Process       RcvTblVer   bRIB/RIB   LabelVer  ImportVer  SendTblVer  StandbyVer
Speaker              54         54         54         54          54           0

Neighbor        Spk    AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down  St/PfxRcd
192.168.2.126     0     1      68      89       54    0    0 00:11:08          0

RP/0/0/CPU0:xrv1#

Haven't you changed something in the code ? or, more probably, haven't you got errors causing your peer to go up and down ? You can follow this by increasing the level of verbosity in 'blt.conf', turn it to 'INFO' might be sufficient.

Let me know.

lukastribus commented 8 years ago

This is vanilla blt-0.4.4 release, I did not change anything in the code.

From the logfile I can clearly see that blt does not send keepalive messages in time, for example, in the following case with 10/30 (10 seconds keepalive interval, 30 seconds holdtime), blt sends the first keepalive correctly at 12:15:57, the second keepalive very late at 12:16:24 (its 27 seconds after the first keepalive) and the third keepalive is not within 30 seconds, which is the reason why the ASR9k tears down the session:

RP/0/RSP0/CPU0:May 17 12:15:57.254 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Up (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:BZ-PE-9#
RP/0/RSP0/CPU0:BZ-PE-9#show bgp neighbor 10.7.250.6
Tue May 17 12:16:09.707 CEST

BGP neighbor is 10.7.250.6
 Remote AS 65535, local AS 65535, internal link
 Remote router ID 0.0.0.1
  BGP state = Established, up for 00:00:12
  NSR State: None
  TCP open mode: passive only

  Last read 00:00:12, Last read before reset 3d19h
  Hold time is 30, keepalive interval is 10 seconds
  Configured hold time: 180, keepalive: 60, min acceptable hold time: 3
  Last write 00:00:07, attempted 7113, written 7113
  Second last write 00:00:07, attempted 15765, written 8652
  Last write before reset 3d19h, attempted 19, written 19
  Second last write before reset 3d19h, attempted 19, written 19
  Last write pulse rcvd  May 17 12:16:02.314 last full May 17 12:16:02.271 pulse count 25276
  Last write pulse rcvd before reset 3d19h
  Socket not armed for io, armed for read, armed for write
  Last write thread event before reset 3d19h, second last 3d19h
  Last KA expiry before reset 3d19h, second last 3d19h
  Last KA error before reset 00:00:00, KA not sent 00:00:00
  Last KA start before reset 3d19h, second last 3d19h
  Precedence: internet
  Non-stop routing is enabled
  Multi-protocol capability received
  Neighbor capabilities:
    Route refresh: advertised (old + new)
    4-byte AS: advertised
    Address family Link-state Link-state: advertised and received
  Received 4275 messages, 0 notifications, 0 in queue
  Sent 301547 messages, 1245 notifications, 0 in queue
  Minimum time between advertisement runs is 0 secs
  Inbound message logging enabled, 3 messages buffered
  Outbound message logging enabled, 3 messages buffered

 For Address Family: Link-state Link-state
  BGP neighbor version 516
  Update group: 0.1 Filter-group: 0.1  No Refresh request being processed
  Route refresh request: received 0, sent 0
  0 accepted prefixes, 0 are bestpaths
  Cumulative no. of prefixes denied: 0.
  Prefix advertised 458, suppressed 0, withdrawn 0
  Maximum prefixes allowed 131072
  Threshold for warning message 75%, restart interval 0 min
  AIGP is enabled
  An EoR was not received during read-only mode
  Last ack version 0, Last synced ack version 0
  Outstanding version objects: current 1, max 2
  Additional-paths operation: None
  Send Multicast Attributes

  Connections established 1255; dropped 1254
  Local host: 10.7.1.28, Local port: 179, IF Handle: 0x00000000
  Foreign host: 10.7.250.6, Foreign port: 41323
  Last reset 3d19h, due to Peer closing down the session
  Peer reset reason: Remote closed the session (Connection timed out)
  Time since last notification sent to neighbor: 3d19h
  Error Code: hold time expired
  Notification data sent:
    None
RP/0/RSP0/CPU0:BZ-PE-9#
RP/0/RSP0/CPU0:May 17 12:16:55.174 : bgp[1058]: %ROUTING-BGP-5-ADJCHANGE : neighbor 10.7.250.6 Down - BGP Notification sent, hold time expired (VRF: default) (AS: 65535)
RP/0/RSP0/CPU0:BZ-PE-9#
root@igpmon01:/var/log# grep -E "writing.+KeepalivePacket" blt.log
12:15:57.242 [New I/O worker #1] INFO  o.n.b.bgp.netty.handlers.BGPv4Codec - writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@7a7d8351[type=4]
2016-05-17 12:15:57,242 INFO  [New I/O worker #1] BGPv4Codec: writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@7a7d8351[type=4]
12:16:24.698 [QuartzScheduler_Worker-8] INFO  o.n.b.bgp.netty.handlers.BGPv4Codec - writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@2e10ecd3[type=4]
2016-05-17 12:16:24,698 INFO  [QuartzScheduler_Worker-8] BGPv4Codec: writing packet onl.netfishers.blt.bgp.netty.protocol.KeepalivePacket@2e10ecd3[type=4]
root@igpmon01:/var/log#

I think blt is busy doing some other things and the keepalive handling is blocked or delayed for that reason. In the log I see a lot of activity like this:

12:16:06.054 [QuartzScheduler_Worker-5] INFO  o.n.blt.snmp.SnmpPollingTask - Task 'SNMP Polling task for router Router BLABLA-IPE-2 (router-id 0100.0700.8011), creation time 1463480164979.' cancelled, because router Router BLABLA-IPE-2 (router-id 0100.0700.8011) is lost.
2016-05-17 12:16:06,054 INFO  [QuartzScheduler_Worker-5] SnmpPollingTask: Task 'SNMP Polling task for router Router BLABLA-IPE-2 (router-id 0100.0700.8011), creation time 1463480164979.' cancelled, because router Router BLABLA-IPE-2 (router-id 0100.0700.8011) is lost.
[...]
2016-05-17 12:16:06,155 INFO  [New I/O worker #1] Task: Scheduling the task '' in 1000ms.
12:16:06.155 [New I/O worker #1] INFO  onl.netfishers.blt.tasks.Task - Scheduling the task '' in 1000ms.
2016-05-17 12:16:06,155 INFO  [New I/O worker #1] Task: Scheduling the task '' in 1000ms.
12:16:06.155 [New I/O worker #1] INFO  onl.netfishers.blt.tasks.Task - Scheduling the task '' in 1000ms.
2016-05-17 12:16:06,155 INFO  [New I/O worker #1] Task: Scheduling the task '' in 1000ms.
12:16:06.155 [New I/O worker #1] INFO  onl.netfishers.blt.tasks.Task - Scheduling the task '' in 1000ms.
2016-05-17 12:16:06,155 INFO  [New I/O worker #1] Task: Scheduling the task '' in 1000ms.
[...]
12:16:08.761 [QuartzScheduler_Worker-4] INFO  o.n.blt.snmp.SnmpPollingTask - Task 'SNMP Polling task for router Router BLABLA-P-1 (router-id 0100.0700.2001), creation time 1463480165384.' cancelled because a similar task has run meanwhile.
2016-05-17 12:16:08,761 INFO  [QuartzScheduler_Worker-4] SnmpPollingTask: Task 'SNMP Polling task for router Router BLABLA-P-1 (router-id 0100.0700.2001), creation time 1463480165384.' cancelled because a similar task has run meanwhile.

I would like to send you the full, non-redacted log file privately, do you mind if I send it to your work address (I assume amoretti at the company at 170 West Tasman Drive)?

This is a production network with about 50 nodes and an average number of links such a network has. I assume you have similar or bigger deployments with blt?

AlainMoretti commented 8 years ago

Hi Lukas, thank you for the investigation. We do have deployments with huge amount of messages (nodes+links+prefixes), and 50 nodes should work.

Yes you can send me a private msg. A complete logfile at INFO level could help us to tackle this issue.

One more thing : It seems you have some routers in 'lost' state. Could you please clean your setup and try again:

AlainMoretti commented 8 years ago

Close this issue since the unexpected behaviour is due to heavy SNMP polling. I am opening another issue to follow this problem.