p4lang / behavioral-model

The reference P4 software switch
Apache License 2.0
532 stars 327 forks source link

simple_switch_grpc: Programming a shared action selector with one shot selectors causes a segmentation fault #1179

Closed fruffy closed 5 months ago

fruffy commented 1 year ago

I am guessing that programming a shared selector with one shots is likely not the right way. It could lead to mismatches. However, even just the attempt seems to cause a segmentation fault. Should this insertion be rejected or accepted?

The program in question is https://github.com/p4lang/p4c/blob/main/testdata/p4_16_samples/action_selector_shared-bmv2.p4.

terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

testlog.log.txt action_selector_shared-bmv2.json.txt action_selector_shared-bmv2.p4info.txt action_selector_shared-bmv2.py.txt switchlog.txt

antoninbas commented 1 year ago

This is definitely a bug, but I doubt it is straightforward to fix. One shot programming is fine and definitely supported. The problem IMO comes from the shared selector. I feel like if you were to use member / group based programming, you would run into the same issue. If you look at the generated JSON file, you will see that your 2 actions have been duplicated by the compiler: you have 2 copies of NoAction and 2 copies of IngressI.drop. While this is not a problem in general, it is a problem with a shared action selector. When a member is created in an action profile / selector, the parameters include the action name and action arguments. The action name needs to be resolved to an action id and eventually an action object (implementation of an action). In order to do that, this function will be called: https://github.com/p4lang/behavioral-model/blob/ed36d8b45d70b736526a3c5b876409631fc5743e/src/bm_sim/P4Objects.cpp#L2533. The problem is that this resolution depends on the match tables using the action selector, because in P4 actions are a property of the table and not of the action selector. If the table using the action selector each have their own "copy" of the action (as is the case in your bmv2 JSON), it is not going to work, and you will see segfaults when lookup are performed in the tables. Ideally, you would not have a copy of the action per table in the generated JSON, but instead a single action entry.

The only way to solve this would be to refactor runtime APIs so that member creation doesn't rely on the action name.

fruffy commented 1 year ago

Thanks for looking into this! Is there potentially a way to craft a compiler pass that detects these shared tables and produces a structure that can work for the runtime API. Or is API refactoring easier?

antoninbas commented 1 year ago

It's unlikely we are going to make any significant change such as this one to P4Runtime. There may also be historical reasons why it is designed this way. Note that the P4Runtime spec has this mention:

If multiple table implementations share an extern instance, then the actions attributes of the tables must have an identical list of P4 actions.

And the P4Info generation actually de-duplicates actions. Realistically, this should be "fixed" in the compiler. I am not exactly sure why multiple copies are generated for each action in the first place.

fruffy commented 1 year ago

It's unlikely we are going to make any significant change such as this one to P4Runtime. There may also be historical reasons why it is designed this way. Note that the P4Runtime spec has this mention:

If multiple table implementations share an extern instance, then the actions attributes of the tables must have an identical list of P4 actions.

And the P4Info generation actually de-duplicates actions. Realistically, this should be "fixed" in the compiler. I am not exactly sure why multiple copies are generated for each action in the first place.

I can try to check this once I have some cycles.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days