Open RithvikChuppala opened 3 months ago
It really depends upon the production switch you have in mind.
For example, Tofino's hardware architecture is such that at a basic introductory level you can say it has the following performance model:
Now of course you can get more nuanced than that, by allowing P4 programs that explicitly recirculate packets, and have other operating points like this:
There are other hardware architectures where the performance will degrade more gradually than that, if you go "a little bit over" the budget of what can be done at X billion packets per second.
Some will have caches between the packet processing core and DRAM, and then cache hit rates play a huge part in the throughput and latency.
Sorry I can't give you a more specific answer, but if you dive at least a bit into two different-enough hardware architectures, you will start to see more of the reasons that "it depends" is the correct answer.
Thanks for the quick reply!
For my use case, I'm implementing packet processing functionality to perform tunneling (stripping tunnel headers, adding new egress headers, etc). I aim to show that executing this packet process functionality in a programmable switch improves throughput and latency metrics compared to the normal software-based approach.
However, since bmv2 isn't an accurate representation of performance, what proxy metric for ideal hardware performance do you think makes the most sense?
If you can, the truly best measure is to implement it and measure the relevant performance metrics on a real hardware device.
If for some reason that is not possible, then the next best thing is to learn about some hardware device in enough detail that you can make a good educated guess what the performance metrics would be.
I know that bmv2's performance is not production-grade and there are a lot of hardware-dependent factors but is there some approximate performance conversion factor or methodology to gauge the relative performance of the packet programs from bmv2's software switch to a production hardware switch? Something like clock cycles, CPU utilization, etc?