sonic-net / DASH

Disaggregated APIs for SONiC Hosts
Apache License 2.0
84 stars 91 forks source link

sai_get_switch-attribute() returns 0 when requesting number_of_active_ports #235

Closed anton7811 closed 1 year ago

anton7811 commented 2 years ago

The issue was discovered when running tests from PR #231.

This issue blocks all SAI PTF tests run. self.portX objects cannot be initialized. Standard SAI PTF fails on SaiHelperBase.get_active_port_list() method. And adapted DASH PTF tests are skipped in SaiHelperSimplified.getattr because of self.port0 does not exist.

Log:

(Pdb++) sai_thrift_get_switch_attribute(self.client, number_of_active_ports=True)
{'SAI_SWITCH_ATTR_NUMBER_OF_ACTIVE_PORTS': 0, 'number_of_active_ports': 0}
aputriax commented 2 years ago

Here is the full list of required attributes:

Mandatory

sai_thrift_get_switch_attribute(self.client, number_of_active_ports=True)
sai_thrift_get_switch_attribute(self.client, port_list=sai_thrift_object_list_t(idlist=[], count=self.active_ports_no))

The following attributes are all related to the CPU port and its properties:

sai_thrift_get_switch_attribute(self.client, cpu_port=True)
sai_thrift_get_port_attribute(self.client, self.cpu_port_hdl, qos_number_of_queues=True)
q_list = sai_thrift_object_list_t(count=num_queues)
sai_thrift_get_port_attribute(self.client, self.cpu_port_hdl, qos_queue_list=q_list)
sai_thrift_get_queue_attribute(self.client, queue_id, port=True, index=True, parent_scheduler_node=True)

Stub is needed Also we need some stubs for the following:

sai_thrift_get_switch_attribute(self.client, default_vlan_id=True)
sai_thrift_get_switch_attribute(self.client, default_virtual_router_id=True)

Currently it returns 0, and we need just non zero values. This is not a DASH API, but SAI PTF expects just non-zero value in the base class. So to avoid "if"s for Switch/DASH/BMv2 it would be good to have stub.

chrispsommers commented 2 years ago

I am having second thoughts about the value of implementing all these callbacks for bmv2's libsai. It's trying to make the target library pretend it's something which it is not. I would rather see explicit treatment of this situation instead of using hidden side-effects of returning fake, non-zero values - that becomes a maintenance liability and is basically a "trick" which is "tribal knowledge." This can fade over time. It also burdens the target library.

Perhaps the framework should support the target, not the other way around. The only valuable attribute I see in the list above is number_of_active_ports = 2 and maybe qos_number_of_queues = 0. In fact I'd rather it return NOT_IMPLEMENTED for all but the bare minimum attributes. The test framework can then decide how to handle this gracefully (e.g. if there are no queues, then don't ask for queue attributes and don't try to use them). Is it possible to adjust the framework to more naturally accommodate extremely limited, pure-SW targets like bmv2? It could require a little more effort than if statements sprinkled in the code, perhaps another fixture approach would be appropriate. @anton7811 What do you think?

chrispsommers commented 2 years ago

Update: I submitted a PR https://github.com/PLVision/DASH/pull/37 to PLVision's fork of DASH to add the requested attributes minus CPU port-related ones since bmv2 lacks this capability. It will eventually be merged into Azure DASH when they are ready to do a PR for SAI-challenger enhancements for DASH.

KrisNey-MSFT commented 2 years ago

Should we close this one, based upon the comment above? Or leave open? LMK...

chrispsommers commented 2 years ago

Not yet, will address in near future.

aputriax commented 1 year ago

@chrispsommers , I'd like to verify this with PTF tests. I took this patch and see that it fails with the standard PTF init. Please see at SaiHelperBase.saveNumberOfAvaiableResources() With patch it returns None. And without patch it returns: {'SAI_SWITCH_ATTR_ECMP_MEMBERS': 0, 'ecmp_members': 0, 'SAI_SWITCH_ATTR_NUMBER_OF_ECMP_GROUPS': 0, 'number_of_ecmp_groups': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY': 0, 'available_ipv4_route_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPV6_ROUTE_ENTRY': 0, 'available_ipv6_route_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEXTHOP_ENTRY': 0, 'available_ipv4_nexthop_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEXTHOP_ENTRY': 0, 'available_ipv6_nexthop_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPV4_NEIGHBOR_ENTRY': 0, 'available_ipv4_neighbor_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPV6_NEIGHBOR_ENTRY': 0, 'available_ipv6_neighbor_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY': 0, 'available_next_hop_group_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_MEMBER_ENTRY': 0, 'available_next_hop_group_member_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_FDB_ENTRY': 0, 'available_fdb_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY': 0, 'available_ipmc_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY': 0, 'available_snat_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_DNAT_ENTRY': 0, 'available_dnat_entry': 0, 'SAI_SWITCH_ATTR_AVAILABLE_DOUBLE_NAT_ENTRY': 0, 'available_double_nat_entry': 0}

To check I applied this patch, did make clean and make all. Are those steps were enough?

chrispsommers commented 1 year ago

Hi Anton, The patch provides a few switch attributes but not all. I think the old code w/o patch succeeded calling SaiHelperBase.saveNumberOfAvaiableResources() "by luck" because the bmv2 libsai code ignored the switch attribute get request and returned SAI_STATUS_SUCCESS. By chance the PTF code interpreted some values as zero. For things like # ports it caused a test failure but for other things it's OK I guess.

Please try the following: Change these two lines to return SAI_STATUS_SUCCESS: https://github.com/Azure/DASH/blob/e166b3fe0b2220603c397a3af7545b3997bcd6f1/dash-pipeline/SAI/templates/utils.cpp.j2#L280 https://github.com/Azure/DASH/blob/e166b3fe0b2220603c397a3af7545b3997bcd6f1/dash-pipeline/SAI/templates/utils.cpp.j2#L283

If you test this revised patch with PTF and you're happy, we could make a PR with this change only and pull it out of https://github.com/Azure/DASH/pull/278 to simplify it.

aputriax commented 1 year ago

Changing return status for line 280 is enough. After this it goes successfully. The only thing which has to be resolved then - cpu port.

chrispsommers commented 1 year ago

Hi @anton7811, thanks for confirming. Regarding the remaining attributes;

sai_thrift_get_switch_attribute(self.client, cpu_port=True)
sai_thrift_get_port_attribute(self.client, self.cpu_port_hdl, qos_number_of_queues=True)

Can you do an experiment (hacking the PTF helper code) to see if returning a fake CPU port handle (e.g. 1) and number of queues = 0 will work? I can probably implement those in bmv2 libsai. I'd prefer to stop there and avoid faking some arbitrary queues and their attributes, it's more work and won't actually accomplish anything but get past some arbitrary PTF test framework code. At some point the PTF code could have some conditionals, instead of spend too much effort faking the libsai.

aputriax commented 1 year ago

Except non zero value for cpu_port_hdl we need to have not None value for the following call.

sai_thrift_get_port_attribute(self.client,
                                             self.cpu_port_hdl,
                                             qos_number_of_queues=True)

Any value, e.g. 0 is ok. Then setUp() passes.

yusefMS06 commented 1 year ago

@aputriax Has new pr been made to fix issue?

chrispsommers commented 1 year ago

@yusefMS06 I closed this, it was fixed in https://github.com/Azure/DASH/pull/281

KrisNey-MSFT commented 1 year ago

@chrispsommers also changing Status to 'Done' in GitHub project