Open Yi-Tseng opened 3 years ago
Following the RFC means having the final IP TTL decremented by the number of hops inside the fabric (e.g, ttl = ttl - 2
for packets going from one leaf to the other in a 2x2).
But considering that we use MPLS only inside the fabric (i.e., we don't peer with external MPLS routers) and that Trellis abstracts the whole fabric as one big IP router, do we need to follow the RFC? Or should we just make sure that the IP TTL is decremented by one independently of the number of hops inside the fabric? cc @charlesmcchan @pierventre
Shouldn't it be -3 instead of -2? Our case should be the same as figure 3-4 in this page
I believe SR does both COPY_OUT
and COPY_IN
at the first and last hop respectively.
Yes, it should be -3
...
I just realized that since we do penultimate hop popping (i.e., spine pops MPLS), without copying the TTL between MPLS and IP, we cannot prevent loops inside the fabric...
I still think that decrementing the IP TTL by the number of hops inside the fabric is wrong. IMO the fabric should behave like one big router between the access devices and the Internet. The fact that we use MPLS tunnels internally is an implementation detail, and the IP TTL should not be affected by the number of switches inside the fabric. Instead, the IP TTL should be frozen when inside the tunnel.
However, using penultimate hop popping doesn't leave us any other choice if we want protection against loops. We should change segmentrouting
to support ultimate hop popping (i.e., dest leaf pops MPLS) to be able to detect tunnels inside the fabric.
What SR does today is completely legit as described in RFC3443 section 3.1.
I found it hard to justify the benefit of making such changes, taking the amount of work that needs to be done into account. We need a stronger reason to prioritize this.
I agree that we don't have strong reasons to do this change. I just wanted to voice my concern.
I will make the change to fix the TTL behavior such that we comply with RFC3443 section 3.1, but most importantly with segmentrouting
flow objectives.
Currently, we set the MPLS TTL value to a default one(64), however, we should copy the TTL from the IP header. We also need to set the TTL back to the IP header when we pop the MPLS label.
Not much detail in the original RFC showing how to handle TTL with IP packet https://tools.ietf.org/html/rfc3031#section-3.23
But there are some rules in RFC2032 (MPLS Label Stack Encoding) https://tools.ietf.org/html/rfc3032#section-2.4.3
Also, there are some explanations on these websites: https://www.ciscopress.com/articles/article.asp?p=680824&seqNum=4 http://wiki.kemot-net.com/mpls-ttl-behavior
Which says we need to set the TTL value to
TTL-1
from the previous header (push/swap/pop)