networkservicemesh / sdk-kernel

Apache License 2.0
6 stars 17 forks source link

IP and policy removal not working on connection update #426

Open LionelJouin opened 2 years ago

LionelJouin commented 2 years ago

After some testing with this change: https://github.com/networkservicemesh/sdk/issues/1231. I found it was possible to add new IP addresses and policy routes by just requesting (updating) a new time without having to close the connection.

The issue I noticed is that it is not possible to remove any IP or policy route while they are removed from the IP context in the request.

denis-tingaikin commented 2 years ago

I feel this is totally related to https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/441

To resolve the issue we should rework each chain element from sdk-vpp and add a chec on change. If we have a change then we need to delete previosly created things. otherwise do as we do currently.

denis-tingaikin commented 2 years ago

cc @edwarnicke

LionelJouin commented 2 years ago

Right, it is related. Just in my use case, the NSC is intentionally changing the content of the IP context. From my tests, the ipadress and iprule chain elements are working fine for adding but not for removing.

edwarnicke commented 2 years ago

@LionelJouin That likely is related to needing to update the chain elements that do that work.

@denis-tingaikin One other thing to watch carefully is that we are only doing the work on the return stroke of the call in those chain elements. If the NSE is OK allowing the NSC to update IP context... that's fine... but its not OK to act on NSC requests to update IPContext until the NSE has returned those updated (meaning it thinks they are valid).

@LionelJouin Does that make sense to you?

LionelJouin commented 2 years ago

yes, it's clear to me.

denis-tingaikin commented 2 years ago

@LionelJouin https://github.com/networkservicemesh/sdk/pull/1234 is merged. Could you test is it solves the problem?

LionelJouin commented 2 years ago

I tried it, it works fine, It solved this issue: https://github.com/networkservicemesh/sdk/issues/1231

edwarnicke commented 2 years ago

@LionelJouin and does it also solve your issue here with IP policy removal on connection update? Or is that still open?

LionelJouin commented 2 years ago

no, it is not solved, this issue is still open (https://github.com/networkservicemesh/sdk-kernel/issues/426).

LionelJouin commented 2 years ago

@edwarnicke @denis-tingaikin How the forwarder resiliency could be handled? Handling IP addresses would be easy since it is possible to get the IP addresses by checking the interface directly and then compare them to the ones in the IPContext to find which ones have to be added or removed. But I believe this would be more complex for the policy routes, how the forwarder could remember which rule/table belongs to which policy? (in case the forwarder crashes/restarts/...)

glazychev-art commented 2 years ago

Hello @LionelJouin, Maybe I misunderstood you. Is the problem still present after this fix? - https://github.com/networkservicemesh/sdk-kernel/pull/437

LionelJouin commented 2 years ago

Hi, yes, I think so. For instance, if the forwarder restarts, then the policies map will be emptied, and then the forwarder will loose track of which ip tables/rules it was managing previously.

glazychev-art commented 2 years ago

Ah.. got it. So, if forwarder restarts - client starts healing with the connection, where we should have our rules. And I think we can restore them in forwarder. At this line we can check tables map, and if it is empty for a given connection Id - try to find added table IDs using netlink.RuleList().

edwarnicke commented 2 years ago

@LionelJouin One other thing to keep in mind is that if your NSE wants to trigger a change... it should probably use:

https://github.com/networkservicemesh/api/blob/main/pkg/api/networkservice/connection.proto#L36

Which is to say:

  1. NSE sends a ConnectionEvent over Monitor towards the NSE with State set to REFRESH_REQUESTED
  2. This will cause the NSC to send a new Request
  3. The NSE can return a changed Connection.ConnectionContext

Basically... this way you don't have to wait around for a Refresh to happen at Expiration.

glazychev-art commented 2 years ago

@LionelJouin As far as I can see, all main PRs have been merged. Can we close this issue?

LionelJouin commented 2 years ago

There is still 2e2 tests to develop and other properties to support. We could keep this issue, otherwise I would open new ones for the e2e and properties

edwarnicke commented 2 years ago

@LionelJouin Agree on tracking the other issues. I'd suggest opening new issues and referencing this one.