osrg / gobgp

BGP implemented in the Go Programming Language
https://osrg.github.io/gobgp/
Apache License 2.0
3.54k stars 680 forks source link

Dealing with VRFs #1648

Open amanshaikh75 opened 6 years ago

amanshaikh75 commented 6 years ago

It seems to me that the way GoBGP handles VRFs is not correct. Let me illustrate this by describing two work-flows:

Handling receipt of a BGP update message

Addition of a new VRF through configuration

iwaseyusuke commented 6 years ago

@amanshaikh75 Hi, please let me clear my head, first. The key point of your suggestion is that GoBGP should calculate the best path per VRF and then should install the path with the VRF context to Zebra, right? I guess it does not so many differences whether applying the best path selection algorithm with the global table or with the VRF table context, because the RD on the VRF should be unique in the all VRF tables, and <RD>:<Prefix> is unique per VRF, then the best path is selected per VRF.

With the following topology

              +------------------------+
        IPv4  | r3                     |
+----+  Uni   | +------+    +--------+ |
| r1 |----------| VRF1 |--->| Global | |
+----+        | +------+    |        | |  VPNv4  +----+
              |             |        |-----------| r4 |
        IPv4  |             |        | |         +----+
+----+  Uni   | +------+    |        | |
| r2 |----------| VRF2 |--->|        | |
+----+        | +------+    +--------+ |
              |               ZAPI |   |
              |                    V   |
              | +--------------------+ |
              | | Zebra              | |
              | +--------------------+ |
              +------------------------+

When the r1 and r2 advertise the same prefix 192.168.1.0/24

r1> gobgp global rib -a ipv4 add 192.168.1.0/24
r1> gobgp global rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.1.0/24       0.0.0.0                                   00:00:00   [{Origin: ?}]

r2> gobgp global rib -a ipv4 add 192.168.1.0/24
r2> gobgp global rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.1.0/24       0.0.0.0                                   00:00:00   [{Origin: ?}]

GoBGP imports each path per VRF separately and also imports them with RD to the global table.

r3> gobgp vrf 1 rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
   192.168.1.0/24       10.0.0.1             65001                00:00:00   [{Origin: ?}]

r3> gobgp vrf 2 rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
   192.168.1.0/24       10.0.0.2             65002                00:00:00   [{Origin: ?}]

r3> gobgp global rib -a vpnv4
   Network                  Labels     Next Hop             AS_PATH              Age        Attrs
*> 65000:100:192.168.1.0/24 [0]        10.0.0.1             65001                00:00:00   [{Origin: ?} {Extcomms: [65000:100]}]
*> 65000:200:192.168.1.0/24 [0]        10.0.0.2             65002                00:00:00   [{Origin: ?} {Extcomms: [65000:200]}]

Then, GoBGP (zclient.go) will install VPN routes into Zebra by using the paths on the global table which contains VRF IDs. https://github.com/osrg/gobgp/blob/d31262de7d91c81ff979b39950d2d859666dfa3f/server/zclient.go#L241-L249 https://github.com/osrg/gobgp/blob/d31262de7d91c81ff979b39950d2d859666dfa3f/server/zclient.go#L507-L510

Am I misunderstanding?

amanshaikh75 commented 6 years ago

@iwaseyusuke

Hi,

You're right that in the case above, there is no need to calculate BGP best path for each VRF separately since the two VRFs are using different RD's as they should. However, consider a case where r4 sends the following VPNv4 route to r3:

65000:300:192.168.1.0/24 [0]        10.0.0.1             65001                00:00:00   [{Origin: ?} {Extcomms: [65000:100]}]

When this route arrives, GoBGP at r3 will calculate best paths in global RIB, and will choose all three routes as best since each of the three routes has a distinct RD.

r3> gobgp global rib -a vpnv4
   Network                  Labels     Next Hop             AS_PATH              Age        Attrs
*> 65000:100:192.168.1.0/24 [0]        10.0.0.1             65001                00:00:00   [{Origin: ?} {Extcomms: [65000:100]}]
*> 65000:200:192.168.1.0/24 [0]        10.0.0.2             65002                00:00:00   [{Origin: ?} {Extcomms: [65000:200]}]
*> 65000:300:192.168.1.0/24 [0]        10.0.0.1             65001                00:00:00   [{Origin: ?} {Extcomms: [65000:100]}]

Since the first and the third routes have the same route targets, both of them will be imported into VRF 1, and will be installed as best routes into Zebra. In reality, the first route is better than the third route since it is learned over an eBGP session while the other route is learned over an iBGP session (assuming r3-r4 session is an iBGP one). Do you agree?

Speaking more generally, the following cases will create problems with GoBGP's current way of handing VRF routes:

iwaseyusuke commented 6 years ago

@amanshaikh75 Hi,

I don't know whether the different values of RD and RT are used in the real (production) MPLS VPN services and I think the same value should be better for the maintainability.

But the different RD and RT are not prohibited, and I tried the following with Cisco routers.

+----+                        +----+
| R1 |---------(iBGP)---------| R2 |
+----+                        +----+
 - Vrf1                        - Vrf3
   RD 65000:100                  RD 65000:300
   RT 65000:100                  RT 65000:100  <--- the same RT with Vrf1
   * 192.168.1.0/24              * 192.168.1.0/24  <--- the same prefix exist on R1

 - Vrf2
   RD 65000:200
   RT 65000:200
   * 192.168.3.0/24

With the above situation, the route from R2 "65000:300:192.168.2.0/24" was imported as the following.

R1#show ip bgp all
For address family: IPv4 Unicast

For address family: VPNv4 Unicast

BGP table version is 4, local router ID is 10.0.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 65000:100 (default for vrf Vrf1)
 * i 192.168.1.0      10.0.0.2                 0    100      0 ?
 *>                   0.0.0.0                  0         32768 ?
Route Distinguisher: 65000:200 (default for vrf Vrf2)
 *>  192.168.3.0      0.0.0.0                  0         32768 ?
Route Distinguisher: 65000:300
 *>i 192.168.1.0      10.0.0.2                 0    100      0 ?

For address family: IPv4 Multicast

For address family: MVPNv4 Unicast

It seems that "65000:300:192.168.2.0/24" was copied and translated to "65000:100:192.168.2.0/24" and the both paths were imported into the global table.

Also, as you said, the best path seems to be selected per VRF (a local connected path is selected on "Vrf1"). You mean this behavior?

amanshaikh75 commented 6 years ago

Hi @iwaseyusuke

Yes, this is exactly what I am referring to. I don't know how common this is, but it's not precluded.

Another way this situation arises is when routes are shared between different VPNs (or VRFs).

iwaseyusuke commented 6 years ago

@amanshaikh75 Hi,

Does this PR (https://github.com/osrg/gobgp/pull/1656) address this issue? This patch clones the VPN path having the different RD and overwrites the RD on the cloned path by the RD of matched VRFs.

amanshaikh75 commented 6 years ago

@iwaseyusuke Hi

Your PR seems to be a step in the right direction. However, here are couple of items to think about:

iwaseyusuke commented 6 years ago

@amanshaikh75 Sorry for the delay and thank you for reviewing.

  • When routes are cloned and imported into the VRF, keeping the RD is not really required since all routes are going to have the same RD. In fact, routes coming from the VRF should be imported as Labeled unicast routes (SAFI-4) in my opinion. https://tools.ietf.org/html/rfc8277#section-5 talks about comparing SAFI-4 with SAFI-1 routes.

I think routes from a VRF should be imported as MPLS-labeled VPN routes (SAFI=128). Sorry I might not understand RFC8277 enough, but section-5 can be said for SAFI=128? And section-5 mentions some implementations but does not clarify which approach is the best.

  • Does your code perform best path calculation in the context of the VRF? If not, how does it decide which route to send to attached CE?

Yes, the cloned path should be selected as the best path on the specific VRF. The behavior (the output of gobgp command) is almost equivalent to the Cisco router on my previous comment.

amanshaikh75 commented 6 years ago

I agree that section-5 of RFC 8277 does not advocate one particular approach. On the other hand, bringing routes into VRF as SAFI=128 still has the problem in that routes learned from CEs are usually SAFI=1 routes. So, now you face the problem of comparing SAFI=1 and SAFI=128 routes, right?

So, are you now maintaining separate route table for each VRF?

iwaseyusuke commented 6 years ago

I agree that section-5 of RFC 8277 does not advocate one particular approach. On the other hand, bringing routes into VRF as SAFI=128 still has the problem in that routes learned from CEs are usually SAFI=1 routes. So, now you face the problem of comparing SAFI=1 and SAFI=128 routes, right?

You mean routes from PE(or P) routers can not be compared with routes from CE routers because routes from PEs will be SAFI=128 and routes from CEs will be SAFI=1, right? GoBGP does the best path calculation on only its global table and routes from CEs should be translated to VPN routes (SAFI=128) then imported into the global table. On VRF, routes from CEs are represented as SAFI=1, but SAFI=128 on the global table. So both routes from PEs and from CEs are SAFI=128 on global table, then I guess GoBGP can compare routes from PEs and routes from CEs on the global table.

So, are you now maintaining separate route table for each VRF?

No, GoBGP does not maintain each VRF table and maintains only global table.

For example, with the following topology, if both CE1 and CE2 has the same prefix "192.168.1.0/24" and the RD on GoBGP's VRF (65000:100) and the RD on PE's VRF (65000:200),

                              +-----------------------+
                        IPv4  | GoBGP                 |
               +-----+  Uni   | +-----+    +--------+ |  VPNv4  +----+
192.168.1.0/24 | CE1 |----------| VRF |--->| Global |<----------| PE |... CE2 192.168.1.0/24
               +-----+        | +-----+    +--------+ |         +----+
                              +-----------------------+
                                 RD 65000:100                    RD 65000:200

GoBGP will receive an IPv4 prefix "192.168.1.0/24" from CE and another VPNv4 prefix "65000:200:192.168.1.0/24" from PE.

For CE side, GoBGP will translate the prefix "192.168.1.0/24" from CE to the VPNv4 prefix "65000:100:192.168.1.0/24" and import it to its global table. On the other hand, with my patch, GoBGP clones the VPNv4 prefix "65000:200:192.168.1.0/24" from PE and make a copy as the VPNv4 prefix "65000:100:192.168.1.0/24" and import both to its global table. At this time, on global table, there are 3 prefixes like;

- "65000:100:192.168.1.0/24" (from CE) (In this example, this prefix is preferred than from PE)
- "65000:100:192.168.1.0/24" (copy of the path from PE)
- "65000:200:192.168.1.0/24" (from PE)

Then GoBGP applies the best path calculation and the first one and the last one are selected, but the last one will not be installed to Zebra because the last one does not have VRF ID.

amanshaikh75 commented 6 years ago

I see. So essentially, you are creating a per-VRF table within the global table by cloning paths if necessary. All the VRF-specific paths will have VPN prefixes with the VRF's RD which will allow gobgp to run the decision process appropriately. I can't think of any obvious problems with this approach.

iwaseyusuke commented 6 years ago

@amanshaikh75 Thanks for your confirmation.

I think this approach is the similar to the way to "regard SAFI-1 routes and SAFI-4 routes as completely independent".

amanshaikh75 commented 6 years ago

One thing @iwaseyusuke .

I have noticed with the current implementation, when a new VRF and a neighbor within it are added to GoBGP, the deamon does not send existing VPNv4 routes that can be imported into the VRF to the neighbor during RIB synchronization. Is this something your PR fixes?

iwaseyusuke commented 6 years ago

@amanshaikh75 Thanks for your report! But sorry this PR does not address that issue I think.

amanshaikh75 commented 6 years ago

Will you be willing to create a new PR to address this issue?

BTW with ZAPI version 5, I have been able to find a way to install VPN routes with double encapsulation at the ingress PE. See zapi_version_5 in my gobgp repository. With this update, I am now able to use GoBGP for CE-to-CE traffic in L3VPN scenario.

iwaseyusuke commented 6 years ago

Will you be willing to create a new PR to address this issue?

Yes, thanks! It is highly welcomed!

BTW with ZAPI version 5, I have been able to find a way to install VPN routes with double encapsulation at the ingress PE. See zapi_version_5 in my gobgp repository. With this update, I am now able to use GoBGP for CE-to-CE traffic in L3VPN scenario.

That sounds great! But, supporting both v4 and v5 seems to make the codes complex... Hmmm...

adisai123 commented 3 years ago

I have a scenario; same routes published 1.1.1.1/32 with red and blue vrf tag , arrived at gobgp; those routes get inserted in different routing table ; now I want to access both routes from same vm , how I will do that. Please guide me.

adisai123 commented 3 years ago

@amanshaikh75 Hi, please let me clear my head, first. The key point of your suggestion is that GoBGP should calculate the best path per VRF and then should install the path with the VRF context to Zebra, right? I guess it does not so many differences whether applying the best path selection algorithm with the global table or with the VRF table context, because the RD on the VRF should be unique in the all VRF tables, and <RD>:<Prefix> is unique per VRF, then the best path is selected per VRF.

With the following topology

              +------------------------+
        IPv4  | r3                     |
+----+  Uni   | +------+    +--------+ |
| r1 |----------| VRF1 |--->| Global | |
+----+        | +------+    |        | |  VPNv4  +----+
              |             |        |-----------| r4 |
        IPv4  |             |        | |         +----+
+----+  Uni   | +------+    |        | |
| r2 |----------| VRF2 |--->|        | |
+----+        | +------+    +--------+ |
              |               ZAPI |   |
              |                    V   |
              | +--------------------+ |
              | | Zebra              | |
              | +--------------------+ |
              +------------------------+

When the r1 and r2 advertise the same prefix 192.168.1.0/24

r1> gobgp global rib -a ipv4 add 192.168.1.0/24
r1> gobgp global rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.1.0/24       0.0.0.0                                   00:00:00   [{Origin: ?}]

r2> gobgp global rib -a ipv4 add 192.168.1.0/24
r2> gobgp global rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
*> 192.168.1.0/24       0.0.0.0                                   00:00:00   [{Origin: ?}]

GoBGP imports each path per VRF separately and also imports them with RD to the global table.

r3> gobgp vrf 1 rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
   192.168.1.0/24       10.0.0.1             65001                00:00:00   [{Origin: ?}]

r3> gobgp vrf 2 rib -a ipv4
   Network              Next Hop             AS_PATH              Age        Attrs
   192.168.1.0/24       10.0.0.2             65002                00:00:00   [{Origin: ?}]

r3> gobgp global rib -a vpnv4
   Network                  Labels     Next Hop             AS_PATH              Age        Attrs
*> 65000:100:192.168.1.0/24 [0]        10.0.0.1             65001                00:00:00   [{Origin: ?} {Extcomms: [65000:100]}]
*> 65000:200:192.168.1.0/24 [0]        10.0.0.2             65002                00:00:00   [{Origin: ?} {Extcomms: [65000:200]}]

Then, GoBGP (zclient.go) will install VPN routes into Zebra by using the paths on the global table which contains VRF IDs. https://github.com/osrg/gobgp/blob/d31262de7d91c81ff979b39950d2d859666dfa3f/server/zclient.go#L241-L249

https://github.com/osrg/gobgp/blob/d31262de7d91c81ff979b39950d2d859666dfa3f/server/zclient.go#L507-L510

Am I misunderstanding?

from router 3 I want to access both 65000:100:192.168.1.0/24 and 65000:200:192.168.1.0/24; how to do that ; please guide me.

lastorel commented 5 months ago

Even in the only one VRF and unique RD (example 65000:100), the router must execute path selection algorithm into VRF table when a prefix (example 10.0.0.0/8) received from CE1. At the receiving moment VRF table can already contain 10.0.0.0/8 (with local pref 90 and not best) from another CE2 and the same from remote CE3 (VPNv4).

At my case new prefix will win because it's received from eBGP and has not decreased local preference. After that it must be injected (cloned) into global table in vpnv4 format. And there global table performs independent new path selection in vpnv4 AFI (and the new route can be not best).

               +------------------------+
         IPv4  | PE                     |
+-----+  Uni   | +------+    +--------+ |
| CE1 |----------| VRF1 |--->| Global | |
+-----+        | |      |    |        | |  VPNv4  +----+    +-----+
               | |      |    |        |-----------| PE |----| CE3 |
         IPv4  | |      |    |        | |         +----+    +-----+
+-----+  Uni   | |      |    |        | |
| CE2 |----------|      |    |        | |
+-----+        | +------+    +--------+ |
               |                        |
               |                        |
               +------------------------+

And after that old local CE2 can receive BGP Update with new path because new path was selected as best into VRF.