Open LionelJouin opened 2 years ago
I feel this is totally related to https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/441
To resolve the issue we should rework each chain element from sdk-vpp and add a chec on change. If we have a change then we need to delete previosly created things. otherwise do as we do currently.
cc @edwarnicke
Right, it is related. Just in my use case, the NSC is intentionally changing the content of the IP context. From my tests, the ipadress and iprule chain elements are working fine for adding but not for removing.
@LionelJouin That likely is related to needing to update the chain elements that do that work.
@denis-tingaikin One other thing to watch carefully is that we are only doing the work on the return stroke of the call in those chain elements. If the NSE is OK allowing the NSC to update IP context... that's fine... but its not OK to act on NSC requests to update IPContext until the NSE has returned those updated (meaning it thinks they are valid).
@LionelJouin Does that make sense to you?
yes, it's clear to me.
@LionelJouin https://github.com/networkservicemesh/sdk/pull/1234 is merged. Could you test is it solves the problem?
I tried it, it works fine, It solved this issue: https://github.com/networkservicemesh/sdk/issues/1231
@LionelJouin and does it also solve your issue here with IP policy removal on connection update? Or is that still open?
no, it is not solved, this issue is still open (https://github.com/networkservicemesh/sdk-kernel/issues/426).
@edwarnicke @denis-tingaikin How the forwarder resiliency could be handled? Handling IP addresses would be easy since it is possible to get the IP addresses by checking the interface directly and then compare them to the ones in the IPContext to find which ones have to be added or removed. But I believe this would be more complex for the policy routes, how the forwarder could remember which rule/table belongs to which policy? (in case the forwarder crashes/restarts/...)
Hello @LionelJouin, Maybe I misunderstood you. Is the problem still present after this fix? - https://github.com/networkservicemesh/sdk-kernel/pull/437
Hi, yes, I think so. For instance, if the forwarder restarts, then the policies map will be emptied, and then the forwarder will loose track of which ip tables/rules it was managing previously.
Ah.. got it.
So, if forwarder restarts - client starts healing with the connection
, where we should have our rules. And I think we can restore them in forwarder.
At this line we can check tables
map, and if it is empty for a given connection Id
- try to find added table IDs using netlink.RuleList()
.
@LionelJouin One other thing to keep in mind is that if your NSE wants to trigger a change... it should probably use:
https://github.com/networkservicemesh/api/blob/main/pkg/api/networkservice/connection.proto#L36
Which is to say:
Basically... this way you don't have to wait around for a Refresh to happen at Expiration.
@LionelJouin As far as I can see, all main PRs have been merged. Can we close this issue?
There is still 2e2 tests to develop and other properties to support. We could keep this issue, otherwise I would open new ones for the e2e and properties
@LionelJouin Agree on tracking the other issues. I'd suggest opening new issues and referencing this one.
After some testing with this change: https://github.com/networkservicemesh/sdk/issues/1231. I found it was possible to add new IP addresses and policy routes by just requesting (updating) a new time without having to close the connection.
The issue I noticed is that it is not possible to remove any IP or policy route while they are removed from the IP context in the request.