lolepezy / rpki-prover

Yet another RPKI validator
BSD 3-Clause "New" or "Revised" License
12 stars 7 forks source link

200K+ ROAs are withdrawn on Cisco IOS-XR Router using RPKI Prover #217

Closed rencarlo014 closed 1 week ago

rencarlo014 commented 4 weeks ago

Hello, @lolepezy and everyone. Good day!

I would like to seek for your assistance regarding on the behavior I am encountering on Cisco IOS-XR router using the RPKI Prover. While checking, I've observed that ROAs are being withdrawn. Lots of ROAs. As we are using RPKI state as best path for BGP, the traffic is being rerouted to less preferred paths. You may refer below for the details and necessary configurations from RPKI Prover and Cisco IOS-XR.

Cisco IOS-XR Router:

!
RP/0/RSP0/CPU0:XR1#show bgp rpki server summary
Fri Aug 16 07:54:40.462 UTC

Hostname/Address        Transport       State           Time            ROAs (IPv4/IPv6)
x.x.x.x                 TCP:3323        ESTAB           2d20h           485866/116028
!
RP/0/RSP0/CPU0:XR1#show bgp rpki server summary
Fri Aug 16 08:18:03.288 UTC

Hostname/Address        Transport       State           Time            ROAs (IPv4/IPv6)
x.x.x.x                 TCP:3323        ESTAB           2d21h           246149/57684
!
RP/0/RSP0/CPU0:XR1#show bgp rpki server summary
Fri Aug 16 08:35:42.381 UTC

Hostname/Address        Transport       State           Time            ROAs (IPv4/IPv6)
x.x.x.x                 TCP:3323        ESTAB           2d21h           485865/116024
!
RP/0/RSP0/CPU0:XR1#show bgp rpki server x.x.x.x
Fri Aug 16 08:37:55.105 UTC

RPKI Cache-Server x.x.x.x
  Transport: TCP port 3323
  Connect state: ESTAB
  Conn attempts: 1
  Total byte RX: 223668056
  Total byte TX: 3392
RPKI-RTR protocol information
  Serial number: 273
  Cache nonce: 0xC1C9
  Protocol state: DATA_END
  Refresh  time: 3600 seconds
  Response time: 600 seconds
  Purge time: 60 seconds
  Protocol exchange
    ROAs announced: 4238597 IPv4         1055396 IPv6
    ROAs withdrawn: 3752732 IPv4         939372 IPv6
    Error Reports :      0 sent       0 rcvd
!
RP/0/RSP0/CPU0:XR1# show logging | i 180.232.41.29
Fri Aug 16 08:38:25.369 UTC
RP/0/RSP0/CPU0:Aug 13 10:58:32 UTC: bgp[1066]: %ROUTING-BGP-5-RPKI_ADJCHANGE : x.x.x.x UP
!
RP/0/RSP0/CPU0:XR1#show run router bgp 9658 rpki server x.x.x.x
Fri Aug 16 08:39:10.982 UTC
router bgp AS
 rpki server x.x.x.x
  transport tcp port 3323
  refresh-time 3600
  response-time 600
!
RP/0/RSP0/CPU0:XR1#show tcp brief  | i 3323
Fri Aug 16 08:45:43.609 UTC
   PCB     VRF-ID     Recv-Q Send-Q  Local Address   Foreign Address  State
0x4a440e6c 0x60000000      0      0  y.y.y.y:54878   x.x.x.x:3323     ESTAB
!
RP/0/RSP0/CPU0:XR1#show tcp detail pcb 0x4a440e6c location 0/RSP0/CPU0
Fri Aug 16 08:47:37.333 UTC

==============================================================
Connection state is ESTAB, I/O status: 0, socket status: 0
Established at Tue Aug 13 10:58:11 2024

PCB 0x4a440e6c, SO 0x4a440c4c, TCPCB 0x4a441064, vrfid 0x60000000,
Pak Prio: Normal, TOS: 0, TTL: 255, Hash index: 405
Local host: y.y.y.y, Local port: 54878 (Local App PID: 1020125)
Foreign host: x.x.x.x, Foreign port: 3323
(Local App PID/instance/SPL_APP_ID: 1020125/0/0)

Current send queue size in bytes: 0 (max 16384)
Current receive queue size in bytes: 0 (max 16384)  mis-ordered: 0 bytes
Current receive queue size in packets: 0 (max 60)

Timer          Starts    Wakeups         Next(msec)
Retrans           286          1                0
SendWnd             0          0                0
TimeWait            0          0                0
AckHold         80667        166                0
KeepAlive           1          0                0
PmtuAger            0          0                0
GiveUp              0          0                0
Throttle            0          0                0

   iss: 3859488509  snduna: 3859491914  sndnxt: 3859491914
sndmax: 3859491914  sndwnd: 64256       sndcwnd: 3720
   irs: 2837691013  rcvnxt: 3061359414  rcvwnd: 15704   rcvadv: 3061375118

SRTT: 5 ms,  RTTO: 300 ms,  RTV: 30 ms,  KRTT: 0 ms
minRTT: 0 ms,  maxRTT: 42 ms

ACK hold time: 200 ms, Keepalive time: 0 sec, SYN waittime: 30 sec
Giveup time: 0 ms, Retransmission retries: 0, Retransmit forever: FALSE
Connect retries remaining: 0, connect retry interval: 0 secs

State flags: none
Feature flags: Win Scale, Nagle
Request flags: Win Scale

Datagrams (in bytes): MSS 1240, peer MSS 1460, min MSS 1240, max MSS 1240

Window scales: rcv 0, snd 7, request rcv 0, request snd 7
Timestamp option: recent 0, recent age 0, last ACK sent 0
Sack blocks {start, end}: none
Sack holes {start, end, dups, rxmit}: none

Socket options: SO_NBIO
Socket states: SS_ISCONNECTED, SS_PRIV
Socket receive buffer states: SB_SEL, SB_DEL_WAKEUP
Socket send buffer states: SB_DEL_WAKEUP
Socket receive buffer: Low/High watermark 1/16384
Socket send buffer   : Low/High watermark 2048/16384, Notify threshold 0
Socket misc info     : Rcv data size (sb_cc) 0, so_qlen 0,
                       so_q0len 0, so_qlimit 0, so_error 0
                       so_auto_rearm 1

PDU information:
 #PDU's in buffer: 0
FIB Lookup Cache:  IFH: 0x220  PD ctx: size: 16  data: 0x0 0x0 0x0 0x5592a328
  Num Labels: 0  Label Stack:
Num of peers with authentication info: 0
!

RPKI Prover:

!
[root@RPKIPROVER002 ~]# sudo systemctl status rpki-prover.service
● rpki-prover.service - RPKI Prover
     Loaded: loaded (/etc/systemd/system/rpki-prover.service; enabled; preset: disabled)
     Active: active (running) since Tue 2024-08-13 10:15:35 PST; 2 days ago
   Main PID: 1073969 (rpki-prover)
      Tasks: 29 (limit: 100365)
     Memory: 5.0G
        CPU: 3h 53min 13.416s
     CGroup: /system.slice/rpki-prover.service
             ├─1073969 /usr/local/bin/rpki-prover --with-rtr --rtr-address 0.0.0.0 --rtr-port 3323 --revalidation-interval 900 --rpki-root-directory /etc/rpki-prover
             ├─1672847 /usr/local/bin/rpki-prover --worker version:1723767526298191358:rsync-fetch:rsync://rpki-rsync.mnihyc.com/repo/mnihyc-rpki/1 +RTS -I0 -N2 -A20m -AL64m ->
             ├─1672856 rsync --update --times --timeout=900 --contimeout=900 --max-size=33554432 --min-size=300 --recursive --delete --copy-links rsync://rpki-rsync.mnihyc.com>
             └─1672890 rsync --update --times --timeout=900 --contimeout=900 --max-size=33554432 --min-size=300 --recursive --delete --copy-links rsync://rpki-rsync.mnihyc.com>

Aug 16 08:33:05 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:05.294Z  Fetched https://rpki-rrdp.us-east-2.amazonaws.com/rrdp/bd48a1>
Aug 16 08:33:05 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1676329]     2024-08-16 00:33:05.690Z  https://repo-rpki.idnic.net/rrdp/notification.xml: downloadin>
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1676336]     2024-08-16 00:33:06.322Z  https://rpki-rrdp.us-east-2.amazonaws.com/rrdp/08c2f264-23f9->
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:06.377Z  Fetched https://rpki-rrdp.us-east-2.amazonaws.com/rrdp/08c2f2>
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:06.511Z  Fetched https://repo-rpki.idnic.net/rrdp/notification.xml, to>
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:06.972Z  Validated TA 'apnic', got 133987 VRPs, took 130820ms
Aug 16 08:33:09 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675959]     2024-08-16 00:33:09.336Z  Finished rsynching rsync://rpki-repo.as207960.net/repo/rpki_c>
Aug 16 08:33:09 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:09.345Z  Fetched rsync://rpki-repo.as207960.net/repo/rpki_ca_ecb34b046>
Aug 16 08:33:09 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:09.699Z  Validated TA 'ripe', got 262817 VRPs, took 133545ms
Aug 16 08:33:11 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1073969]     2024-08-16 00:33:10.214Z  Validated all TAs, got 601889 VRPs (probably not unique), 601>
!
[root@RPKIPROVER002 ~]# cat /etc/systemd/s
sleep.conf   system/      system.conf
[root@RPKIPROVER002 ~]# sudo systemctl status rpki-prover.service
● rpki-prover.service - RPKI Prover
     Loaded: loaded (/etc/systemd/system/rpki-prover.service; enabled; preset: disabled)
     Active: active (running) since Tue 2024-08-13 10:15:35 PST; 2 days ago
   Main PID: 1073969 (rpki-prover)
      Tasks: 29 (limit: 100365)
     Memory: 5.0G
        CPU: 3h 53min 13.416s
     CGroup: /system.slice/rpki-prover.service
             ├─1073969 /usr/local/bin/rpki-prover --with-rtr --rtr-address 0.0.0.0 --rtr-port 3323 --revalidation-interval 900 --rpki-root-directory /etc/rpki-prover
             ├─1672847 /usr/local/bin/rpki-prover --worker version:1723767526298191358:rsync-fetch:rsync://rpki-rsync.mnihyc.com/repo/mnihyc-rpki/1 +RTS -I0 -N2 -A20m -AL64m ->
             ├─1672856 rsync --update --times --timeout=900 --contimeout=900 --max-size=33554432 --min-size=300 --recursive --delete --copy-links rsync://rpki-rsync.mnihyc.com>
             └─1672890 rsync --update --times --timeout=900 --contimeout=900 --max-size=33554432 --min-size=300 --recursive --delete --copy-links rsync://rpki-rsync.mnihyc.com>

Aug 16 08:33:05 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:05.294Z  Fetched https://rpki-rrdp.us-east-2.amazonaws.com/rrdp/bd48a1>
Aug 16 08:33:05 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1676329]     2024-08-16 00:33:05.690Z  https://repo-rpki.idnic.net/rrdp/notification.xml: downloadin>
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1676336]     2024-08-16 00:33:06.322Z  https://rpki-rrdp.us-east-2.amazonaws.com/rrdp/08c2f264-23f9->
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:06.377Z  Fetched https://rpki-rrdp.us-east-2.amazonaws.com/rrdp/08c2f2>
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:06.511Z  Fetched https://repo-rpki.idnic.net/rrdp/notification.xml, to>
Aug 16 08:33:06 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:06.972Z  Validated TA 'apnic', got 133987 VRPs, took 130820ms
Aug 16 08:33:09 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675959]     2024-08-16 00:33:09.336Z  Finished rsynching rsync://rpki-repo.as207960.net/repo/rpki_c>
Aug 16 08:33:09 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:09.345Z  Fetched rsync://rpki-repo.as207960.net/repo/rpki_ca_ecb34b046>
Aug 16 08:33:09 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1675446]     2024-08-16 00:33:09.699Z  Validated TA 'ripe', got 262817 VRPs, took 133545ms
Aug 16 08:33:11 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1073969]     2024-08-16 00:33:10.214Z  Validated all TAs, got 601889 VRPs (probably not unique), 601>
lines 1-23/23 (END)
!
[root@RPKIPROVER002 ~]# cat /etc/systemd/system/rpki-prover.service
[Unit]
Description=RPKI Prover
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/rpki-prover --with-rtr --rtr-address 0.0.0.0 --rtr-port 3323 --revalidation-interval 900 --rpki-root-directory /etc/rpki-prover

[Install]
WantedBy=multi-user.target
[root@RPKIPROVER002 ~]# netstat -an | grep 3323
Proto Recv-Q Send-Q Local Address     Foreign Address         State
tcp        0      0 x.x.x.x:3323      y.y.y.y:54878     ESTABLISHED
!
[root@RPKIPROVER002 ~]# netstat -tulpn | grep 3323
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:3323            0.0.0.0:*               LISTEN      1073969/rpki-prover
!

Also, attached herewith is the output of journalctl -u rpki-prover.service --since "2024-08-16 07:00:00": 08162024_Output of Journalctl for RPKIPROVER002.txt

Thanks, Ren

lolepezy commented 3 weeks ago

Hello,

From the log you've provided I can see the following lines

Aug 16 07:48:28 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1073969]     2024-08-15 23:48:26.460Z  Validated all TAs, got 601894 VRPs 
...
Aug 16 08:02:18 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1073969]     2024-08-16 00:02:18.205Z  Validated all TAs, got 303833 VRPs
...
Aug 16 08:18:47 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1073969]     2024-08-16 00:18:46.298Z  Validated all TAs, got 601894 VRPs

303833 is a very low number compared to normal 600K and that is what seem have caused the loss of VRPs in the router.

If I dig more I see two lines

Aug 16 08:01:21 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1670666]     2024-08-16 00:01:21.049Z  Validated TA 'ripe', got 0 VRPs, took 25034ms
...
Aug 16 08:01:21 RPKIPROVER002 rpki-prover[1073969]: Info    [pid 1670666]     2024-08-16 00:01:21.059Z  Validated TA 'lacnic', got 0 VRPs, took 25044ms

Which means at least 2 trust anchor (TAs) out of 5 didn't return any VRPs. Looking further shows earlier lines

Aug 16 08:01:01 RPKIPROVER002 rpki-prover[1073969]: Error   [pid 1670666]     2024-08-16 00:01:01.023Z  Failed to fetch https://rpki.afrinic.net/repository/AfriNIC.c>
Aug 16 08:01:01 RPKIPROVER002 rpki-prover[1073969]: Error   [pid 1670666]     2024-08-16 00:01:01.027Z  Failed to fetch https://rpki.ripe.net/ta/ripe-ncc-ta.cer: Rrd>
Aug 16 08:01:01 RPKIPROVER002 rpki-prover[1073969]: Error   [pid 1670666]     2024-08-16 00:01:01.028Z  Failed to fetch https://rrdp.lacnic.net/ta/rta-lacnic-rpki.ce>

That mean trust anchor certificates for AfriNIC, LACNIC and RIPE couldn't be fetched at that iteration. So apparently there was some temporary loss of connectivity, resulting in loss of TA certificates, in turn, resulting in loss of ROAs.

Strictly speaking, an RPKI validator should verify TA certificates every iteration, but from the practical point of view it looks quite problematic like it happened in your case. I will fix this behaviour in the next release, by allowing some kind of TA certificate local caching.

lolepezy commented 1 week ago

The release https://github.com/lolepezy/rpki-prover/releases/tag/v0.9.6

fixes the issue with TA certificate caching, so the issue shouldn't not appear even in case of intermittent connectivity loss. I'll close the issue as resolved.