sonic-net / DASH

Disaggregated APIs for SONiC Hosts
Apache License 2.0
83 stars 90 forks source link

BMv2 SAI failed to add a default route with DROP action but no next hop; track underlay route #496

Closed Pterosaur closed 4 months ago

Pterosaur commented 9 months ago

I'm using the BMv2 SAI to add a default route but got the following error in the SAI thrifit

e:- mutateTableEntry: GRPC ERROR[2]: ,`
type.googleapis.com/p4.v1.Error&Unexpected number of action parametersALL-sswitch-p4org
e:- mutateTableEntry: GRPC call Write::INSERT ERROR: table_id: 49279256 match { field_id: 1 lpm { value: "\000\000\000\000" prefix_len: 96 } } action { action { action_id: 32404057 params { param_id: 1 value: "\000\000" } } }

My code looks like

        route_entry = sai_thrift_route_entry_t(
            vr_id=None, destination=sai_ipprefix("0.0.0.0/0"))
        sai_thrift_create_route_entry(
            saithrift.client, route_entry, packet_action=SAI_PACKET_ACTION_DROP)

which is similar to SONiC SWSS:https://github.com/sonic-net/sonic-swss/blob/5f367ebb63dab68e73b74092fc083d3d1755b4b3/orchagent/routeorch.cpp#L91-L108 that adds a default route with an action drop.

I suspect the above error was caused by the unsupported of ignored parameter: next_hop_id

https://github.com/sonic-net/DASH/blob/e1309e9c1f20cd210fb9f7a13987f8262e76f372/dash-pipeline/bmv2/underlay.p4#L41-L49

chrispsommers commented 9 months ago

Hi Ze, The underlay support for DASH bmv2 is extremely limited and was mainly added to be able to select which of two ports in the bmv2 model one can route packets out of. Without it, packets would always come out the same port (hairpin). In fact, you can only set nexthop to 0 or 1 corresponding to the two bmv2 dataplane ports. We were under no illusion this was anything close to proper support of underlay functionality, but it was enough to select which port packets route out of for simple test cases. To add true underlay to DASH bmv2 is a large project which perhaps someone will undertake someday.

Here is a test case which applies values of 0 and 1 to the the nexthop id. https://github.com/sonic-net/DASH/blob/main/test/test-cases/scale/saic/vnet_route_setup_commands_bidirectional.json#L614

As a side-comment, the title of this issue doesn't seem to match what you are describing, i.e. failing to add a default route. It might help in tracking this issue to alter it. Thanks.

Pterosaur commented 9 months ago

BMv2 doesn't support the IPv6 default route "::/0" yet and got the following error:

Dec 27 11:29:50.768287 vlab-01 NOTICE swss#orchagent: :- RouteOrch: Create IPv4 default route with packet action drop
Dec 27 11:29:50.768287 vlab-01 INFO syncd#supervisord: syncd n:- mutateTableEntry: GRPC call Write::INSERT OK table_id: 49279256 match { field_id: 1 lpm { value: "\000\000\000\000" prefix_len: 96 } } action { action { action_id: 32404057 params { param_id: 1 value: "\000\000" } params { param_id: 2 value: "\000\000" } } }
Dec 27 11:29:50.768826 vlab-01 ERR syncd#syncd: e:- mutateTableEntry: GRPC ERROR[2]: , #010#002#032¡#001#012#037type.googleapis.com/p4.v1.Error#022~#010#003#022gInvalid reprsentation of 'don't care' LPM match, omit match field instead of using a prefix length of 0#032#021ALL-sswitch-p4org
Dec 27 11:29:50.768826 vlab-01 ERR syncd#syncd: e:- mutateTableEntry: GRPC call Write::INSERT ERROR: table_id: 49279256 match { field_id: 1 lpm { value: "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" } } action { action { action_id: 32404057 params { param_id: 1 value: "\000\000" } params { param_id: 2 value: "\000\000" } } }
Dec 27 11:29:50.768826 vlab-01 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Dec 27 11:29:50.769298 vlab-01 ERR swss#orchagent: :- create: create status: SAI_STATUS_FAILURE
Dec 27 11:29:50.769298 vlab-01 ERR swss#orchagent: :- RouteOrch: Failed to create IPv6 default route with packet action drop
Dec 27 11:29:50.769587 vlab-01 INFO swss#supervisord: orchagent terminate called after throwing an instance of '
Dec 27 11:29:50.769755 vlab-01 INFO swss#supervisord: orchagent std::runtime_error
Dec 27 11:29:50.769933 vlab-01 INFO swss#supervisord: orchagent '
Dec 27 11:29:50.770236 vlab-01 INFO swss#supervisord: orchagent   what():
Dec 27 11:29:50.770354 vlab-01 INFO swss#supervisord: orchagent Failed to create IPv6 default route with packet action drop
Pterosaur commented 9 months ago

Hi Ze, The underlay support for DASH bmv2 is extremely limited and was mainly added to be able to select which of two ports in the bmv2 model one can route packets out of. Without it, packets would always come out the same port (hairpin). In fact, you can only set nexthop to 0 or 1 corresponding to the two bmv2 dataplane ports. We were under no illusion this was anything close to proper support of underlay functionality, but it was enough to select which port packets route out of for simple test cases. To add true underlay to DASH bmv2 is a large project which perhaps someone will undertake someday.

Here is a test case which applies values of 0 and 1 to the the nexthop id. https://github.com/sonic-net/DASH/blob/main/test/test-cases/scale/saic/vnet_route_setup_commands_bidirectional.json#L614

As a side-comment, the title of this issue doesn't seem to match what you are describing, i.e. failing to add a default route. It might help in tracking this issue to alter it. Thanks.

Hi Chris,

Thanks for your suggestion. I'm working on using the BMv2 as the dataplane of SONiC KVM. This issue is for tracking the problem I encountered with the current SONiC SWSS.

Pterosaur commented 9 months ago

Hi, @kcudnik could you please help to take a look.

chrispsommers commented 9 months ago

Thanks for your suggestion. I'm working on using the BMv2 as the dataplane of SONiC KVM. This issue is for tracking the problem I encountered with the current SONiC SWSS.

Hi @Pterosaur Thanks, understood. I think if you were to start tallying the gaps you will start creating a lot of issues and find you're barely chipping away at it. It might make more sense to do a more comprehensive analysis of what is needed, and if using the DASH P4 model as a starting point is even appropriate.

Adding full underlay to DASH BMv2 is a worthy task but a fairly large one. It has been a wish-list item since the inception of the DASH project. One might even say it is outside the scope of DASH itself (which focuses on overlay services definition and APIs), and deserves its own project to which DASH could be an enhancement. At one time there was a highly functional P4 model of SAI dataplane written in P4 but as far as I know it is very behind the current definition.

Also note that BMv2 uses the P4 V1 architectural model which lacks certain capabilities required to truly support DASH (statefulness such as add-on-miss, range-lists match type). Long term we are hoping newer back-ends like P4-DPDK and maybe even P4-TC will be available to take over from bmv2 and provide more capabilities, PNA compliance, much higher performance and longer-term community support.

KrisNey-MSFT commented 7 months ago

@Pterosaur - can we close this one, as it looks like we are trying to use a non-existent feature?

Pterosaur commented 7 months ago

@Pterosaur - can we close this one, as it looks like we are trying to use a non-existent feature?

@r12f is working on this issue. It looks like a feature that's compatible well with the existing SONiC system.

We expect a hairpin capability in DPU (as well as BMv2). But right now, some vendors can just support the U-turn behavior. So, I feel we should adopt these two behaviors for the current development stage and future requirements.

This issue is for tracking the underlay route with these two behaviors. I think we can close it once the Riff enhances the BMv2 implementation.

KrisNey-MSFT commented 7 months ago

Thanks Ze 😊

From: Ze Gan @.> Sent: Thursday, February 29, 2024 7:44 PM To: sonic-net/DASH @.> Cc: Kristina Moore @.>; Comment @.> Subject: Re: [sonic-net/DASH] BMv2 SAI failed to add a default route with DROP action but no next hop (Issue #496)

@Pterosaurhttps://github.com/Pterosaur - can we close this one, as it looks like we are trying to use a non-existent feature?

@r12fhttps://github.com/r12f is working on this issue. It looks like a feature that's compatible well with the existing SONiC system.

We expect a hairpin capability in DPU (as well as BMv2). But right now, some vendors can just support the U-turn behavior. So, I feel we should adopt these two behaviors for the current development stage and future requirements.

This issue is for tracking the underlay route with these two behaviors. I think we can close it once the Riff enhances the BMv2 implementation.

— Reply to this email directly, view it on GitHubhttps://github.com/sonic-net/DASH/issues/496#issuecomment-1972364984, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFJSI6GJQAE6OK7PLNH2WVTYV7TNJAVCNFSM6AAAAABBDCMGWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGM3DIOJYGQ. You are receiving this because you commented.Message ID: @.**@.>>

kcudnik commented 7 months ago

Hi, @kcudnik could you please help to take a look.

From logs seems like response from grpc on bmv2 is returning failure, co clue why, syncd just logged that but why grpc returned that :( maybe try enable debug logging, is this reproducible?

Pterosaur commented 4 months ago

Fix by the PR: https://github.com/sonic-net/DASH/pull/547