Open kcudnik opened 1 year ago
@marian-pritsak can best explain this, it's a byproduct of some implementation choices, not a "bug."
further more if we grep actionId:
24 pi_p4_id_t actionId = 0;
45 actionId = 25655048; // SAI_DASH_ACL_GROUP_ACTION_SET_ACL_GROUP_ATTRS
48 action->set_action_id(actionId);
193 pi_p4_id_t actionId = 0;
283 actionId = 32161567;
289 actionId = 20706700;
295 actionId = 28146588;
301 actionId = 31424218;
314 action->set_action_id(actionId);
423 actionId = 32161567;
429 actionId = 20706700;
435 actionId = 28146588;
441 actionId = 31424218;
454 action->set_action_id(actionId);
563 actionId = 32161567;
569 actionId = 20706700;
575 actionId = 28146588;
581 actionId = 31424218;
594 action->set_action_id(actionId);
703 actionId = 18858683;
709 actionId = 24263137;
715 actionId = 29962337;
721 actionId = 26077229;
734 action->set_action_id(actionId);
843 actionId = 18858683;
849 actionId = 24263137;
855 actionId = 29962337;
861 actionId = 26077229;
874 action->set_action_id(actionId);
983 actionId = 18858683;
989 actionId = 24263137;
995 actionId = 29962337;
1001 actionId = 26077229;
1014 action->set_action_id(actionId);
we can see that those actions repeat for each object, i would like to have some explanation on this, i would really like to stick in creating single SAI object here
@marian-pritsak can best explain this, it's a byproduct of some implementation choices, not a "bug."
@marian-pritsak im not sure, i dont understand entire code yet, but in this create function multiple objects are created, and when you call remove, only 1 objects i removed, unless remove is somehow called multiple times outside SAI which make no sense here also i noticed that insert table and remove are into multi map, but i don;t know how this make sense ether, if each object have unique ID, so there should be no duplicated objects in the multimap
Any update on this ?
I remember this is by design. Because we have 5 stages for each ACL direction, we always manage the ACL by ACL group instead of STAGE. So, in the current design, we insert the ACL rules to every stage and use the group name as the table key to differentiate the stage. If we want to insert an ACL rule only into one stage, that will increase the complexity of hardware/software design.
@marian-pritsak Please correct me if anything is misunderstood.
one SAI api calls should not create 5 different SAI objects
Adding @prsunny and @r12f in case they have the bandwidth or knowledge
@marian-pritsak to review again
@marian-pritsak responded during the week, and plans to review, per 9/27 Community Call
Looks like a bug, need to find time to look for a solution. If anyone else in the Community could help or provide a pointer, please do :)
Using SAI to express the interface, the backend is the issue here.
@AmithGspn offered to take a look, assigning to him
hi @AmithGspn - just curious whether you and Meyappan had been able to check this out? TY!
ubuntu@ip-172-31-30-182:~/DASH_/dash-pipeline/SAI/lib$ egrep -rn "acl_stage|SAI_OBJECT_TYPE_DASH_ACL_RULE" saidashacl.cpp 211: // For stage inbound_acl_stage1 218: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE); 222: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE"); 233: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 342: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 363: // For stage inbound_acl_stage2 370: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE); 374: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE"); 385: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 494: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 515: // For stage inbound_acl_stage3 522: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE); 526: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE"); 537: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 646: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 667: // For stage outbound_acl_stage1 674: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE); 678: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE"); 689: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 798: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 819: // For stage outbound_acl_stage2 826: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE); 830: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE"); 841: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 950: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 971: // For stage outbound_acl_stage3 978: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE); 982: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE"); 993: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id); 1102: auto md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
Explanation - Objects are created at stage levels. I see 3 inbound stages and 3 outbound stages. Total 6 stages. So, 6 objects are created. Guess this is as per design.
Ok maybe this is required by bmv2 but this is still single SAI API call which should return single OID when create, and since 6 are created internally this is not they way it should go, since which object should be returned ? Similar question arises when doing later set/get/remove on such objects where internally are 6 created
tagging @marian-pritsak to take a look please
Here is my comments... lets wait for @marian-pritsak comments to conclude..
Yes single API passing number of OIDs(groups/stages) to be created. In this case, it is asking 6 OIDs(group/stage) to be created.
_static sai_status_t dash_sai_create_dash_acl_rules( In sai_object_id_t switch_id, In uint32_t object_count, <<<< Application requesting number of objects/stages to be created internally In const uint32_t *attr_count, In const sai_attribute_t *attr_list, In sai_bulk_op_error_mode_t mode, Out sai_object_id_t object_id, Out sai_status_t *objectstatuses)
Base pointer to OID(OID[0] shall be returned to Application. Application can use the base OID + group id (OID[groupid]) to index the respective stage to set/get.
Here is my comments... lets wait for @marian-pritsak comments to conclude..
Yes single API passing number of OIDs(groups/stages) to be created. In this case, it is asking 6 OIDs(group/stage) to be created.
_static sai_status_t dash_sai_create_dash_acl_rules( In sai_object_id_t switch_id, In uint32_t object_count, <<<< Application requesting number of objects/stages to be created internally In const uint32_t *attr_count, In const sai_attribute_t *attr_list, In sai_bulk_op_error_mode_t mode, Out sai_object_id_t object_id, Out sai_status_t *objectstatuses)
Base pointer to OID(OID[0] shall be returned to Application. Application can use the base OID + group id (OID[groupid]) to index the respective stage to set/get.
but api im reffering to is singe create not bulk:
176 static sai_status_t dash_sai_create_dash_acl_rule(
177 _Out_ sai_object_id_t *dash_acl_rule_id,
178 _In_ sai_object_id_t switch_id,
179 _In_ uint32_t attr_count,
180 _In_ const sai_attribute_t *attr_list)
that single api inside is creating 6 oids internally
Agreed. Based on the logs/evidences, I believe trigger is bulk create and not single create. We need to check why bulk create is triggered ?
i dont think this is mistaken with bulk api, since number of object does not depend on any counter, since there is no counter in input params, this seems to me like limiatation of bmv2 or some other reason that im not aware, or maybe it was easier to implement ?
ping to @r12f and @marian-pritsak to take a look
also take a look at this part: (there are 5 or 6 of them in that function)
1136 if (false == dashSai->insertInTable(matchActionEntry, objId)) {
1137 goto ErrRet;
1138 }
so actually each of the generated objId is inserted into Table with matchActionEntry, and that table in dash SAI is actually used as
std::unordered_multimap<sai_object_id_t, std::shared_ptr<p4::v1::TableEntry> > m_tableEntryMap;
so multiple objects can be present at the same key, so i believe this was intended action to do this way, i still dont have enough knowledge to understand, whether this could be done in other way, but my intention suggest me that there should be only one objectId per SAI call generated, and if different internal indexes should be used for required bmv2 stages, then this should be handled in different manner
and there is only one line (and should be only one):
1141 *dash_acl_rule_id = objId;
which uses last object id as output to the create function
I have also looked at this issue, and I believe what Kamil said above is true. This looks to be intentional. And here is the detailed explanation in my understanding.
In short, this is caused by the combination result of 5 stages of ACLs and ACL group atomic swap requirement:
Because of these 2 things, if we want to implement ACL rules directly using P4, then all tables will have to hold all ACL groups with all ACL rules in order to make the ENI ACL group attribute update work as expected. However, because 5 ACL rules will be created, the id of the last one is used as return result.
You might find this loop doesn't apply for ACL group, this is because an extra mostly no-op table is created for generating the ACL group with only the the first stage involved. From what I am seeing, this extra table is almost used for generating SAI API only, but not really populating the ACL group id into metatdata.
@name("dash_acl_group|dash_acl")
table acl_group {
key = {
meta.stage1_dash_acl_group_id : exact @name("meta.stage1_dash_acl_group_id:dash_acl_group_id");
}
actions = {
set_acl_group_attrs();
}
}
I agree with Kamil, supposedly, only 1 entry should be inserted while only 1 id should be added. So the question is - can we associate a ACL group with a stage? E.g., adding an attribute in the ACL group to specify which stage it should go. Then we will know exactly which table it should go in the first place, hence only 1 entry will be created.
This might not only apply to BMv2, but also applies to real implementation. Otherwise when a new ACL group followed by ACL rules are added, we cannot really program the ASIC and have to save all the information in memory. The ASIC programming will has to be delayed until the ACL group ID on the ENI is changed, then all ACL rules can be sent to the right table. Considering the scale, this leads to unnecessary persistent memory usage (SAI implementation has to hold the ACL group in memory, even through it is programmed into ASIC). And I am not seeing use cases where 2 stages sharing the same ACL group.... (Why running the same ACL twice back to back?)
I will let @marian-pritsak and @kcudnik to take a look on my comment and see how reasonable it is. And then, we can move on from there.
Reference - Previous discussion on the same issue(https://www.youtube.com/watch?v=bxitnRrQlhY).
My understanding from the recordings: Fixed number of ACL table( 6 - 3 inbound and 3 outbound) are predefined for BMv2. All 6 tables have the same ACL rules only ACL group id differs. Meaning if you invoke "dash_sai_create_dash_acl_rule()", it will create the rule on all 6 ACL tables - this is applicable only for BMv2 and real implementation will have unique ACL tables and unique sai api. I guess this approach was taken in BMv2 for ease implementation.
@r12f This sounds good, but as i mentioned, i have little to nothing knowledge about bmv2, and im not able to determine whether this is correct bahaviour or not, just my assumptions hot it should go
When ACL group is created, can we say it is bound to a stage? @r12f @sharmasushant Call out in the HLD that we bind 1 group to 1 stage Group->Create Rules->ENI->Stage
Original SDN document https://github.com/sonic-net/DASH/blob/main/documentation/general/sdn-pipeline-basic-elements.md#acl-levels
SONiC-DASH HLD document to possibly be updated https://github.com/sonic-net/DASH/blob/main/documentation/general/dash-sonic-hld.md#325-acl
Objects stored in SAI need to have 1:1 mapping with what is provided via DASH proto. And SAI needs to give those objects as it is to sailib and vendor code. There should be no logic in SAI. It should just be a proxy to forward dash objects to vendor datapath implementation. So, forking one acl_group into multiple does not seem right to me. This needs to be fixed.
@r12f indicated this is not easy to fix, and is related to the leaf/spine design
Just checking back - @marian-pritsak and @r12f - so we are ok with the multiple entries for now? @devenjagasia and @kcudnik for visibility
Could you guys read through @r12f 's comment above in the thread please? Riff Comment
Dedicated meeting w/ @r12f and @kcudnik per Marian
Per @r12f - we plan to meet on this separately in December 2024
When sai implementation is generated, functions is added:
in generated file: dash-pipeline/SAI/lib/saidashacl.cpp
but internally i see 5 times, so 5 times object is created;
and then inserted:
but insertion is with the 5 different objects,
that generated code for that functions process in loop several attributes which are duplicated in each process line
can anyone explain my why this is happening ? this function should create only 1 single SAI objects, perhaps multiple bmv2 objects internally, but from SAI perspective only one, i see this as a bug, please explain