p4lang / project-ideas

Ideas for P4 Projects.
Apache License 2.0
6 stars 0 forks source link

Tuple Space Search implementation for BMv2 #10

Open jafingerhut opened 1 month ago

jafingerhut commented 1 month ago

Motivation: There are some people using BMv2 for executing a P4-based reference model for packet processing behavior for the DASH project. Some of them have experienced issues where tables with ternary keys cause the packet processing rate to go very low when the table has tens of thousands of entries, because BMv2 currently performs a linear search for every packet through all entries, in priority order.

Tuple Space Search is implemented in OVS, and for the common case when the number of unique masks in the set of all table entries of a ternary table is much smaller than the number of entries, it can provide very good matching performance. More details in this BMv2 issue:

As of 2024-Sep-16, Davide Scano indicated an interest in working on this enhancement, so please talk to him if you are interested.

Dscano commented 1 month ago

I haven't started yet, but I hope to begin working on it soon. However, if someone is willing to start in the meantime, I'd be more than happy to let them.

fruffy commented 1 month ago

There is also work like Efficuts which tries to address TSS scaling problems. TSS can also blow up, but I do not know whether that is a problem in practice: https://eprints.gla.ac.uk/202085/7/202085.pdf

But compared to TSS I do not know whether Efficuts has been implemented in software packet classification already.

fruffy commented 1 month ago

We are also building an open-source fuzzer, which can generate P4Runtime entries for P4 programs: https://github.com/fruffy/p4rtsmith

This fuzzer allows you to generate many random entries for a single P4 program. You can also tweak it to have different characteristcs. We can use this to torture-test BMv2 after this enhancement is implemented.

jafingerhut commented 1 month ago

Thanks for the link to the paper describing an attack on TSS algorithm. I have not heard of this attack before (but hope to find time to read the paper you linked), nor implemented TSS personally, but at least the very basic version of TSS, that simply does one hash table lookup per unique mask, should be linear time in the number of unique masks. The main attack against that is to install rule sets with all unique masks, so that TSS degrades to linear search through all entries. That should be no worse than the linear time search that BMv2 implements today for BMv2, and in the case I expect to be common, where many rules have the same mask, it should be significantly faster at table lookup than what we have today.

The main reason I recommended TSS is: (a) relatively straightforward to implement, compared to almost all other packet classification algorithms I am aware of that work on a general purpose CPU, and (b) adding and removing rules can be O(1) time, or close to it.

fruffy commented 1 month ago

Yes, I am not suggesting to not implement TSS, but there is some interesting follow-up work that could be done here. Curious to see the results!