baby hero test scale - Githubissues

mgheorghe commented 1 month ago

Test SAI Baby Hero. Baby Hero test scale is 1% of the Hero test scale. This is achieved by having only one prefix per ACL instead of 100 prefixes per ACL.

had to increase the BMv2 scale numbers to 8M and 4M from default 1K for mapping and routing tables

scale:

SAI_OBJECT_TYPE_VIP_ENTRY = 1
SAI_OBJECT_TYPE_DIRECTION_LOOKUP_ENTRY = 32
SAI_OBJECT_TYPE_VNET = 32
SAI_OBJECT_TYPE_DASH_ACL_GROUP = 320
SAI_OBJECT_TYPE_ENI = 32
SAI_OBJECT_TYPE_ENI_ETHER_ADDRESS_MAP_ENTRY = 32
SAI_OBJECT_TYPE_OUTBOUND_CA_TO_PA_ENTRY = 80.000 + 320
SAI_OBJECT_TYPE_OUTBOUND_ROUTING_ENTRY = 80.000 (SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET) + 80.000 (SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET_DIRECT)
SAI_OBJECT_TYPE_PA_VALIDATION_ENTRY = 32

run time:

on my setup
- configure BMv2: 15 min
- run traffic: 3 min
- clear config: 6 min
github action runtime 10 min for whole test

TO DO:

add back ACLs once BMv2 issue solved https://github.com/sonic-net/DASH/issues/575 https://github.com/sonic-net/DASH/issues/91
add traffic for SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET_DIRECT ips
add traffic for all SAI_OBJECT_TYPE_DASH_ACL_GROUP (currently only first one for each ENI is exercised) - need to limit the pps to match BMv2 capability and also balance the test duration
use dpugen and remove the duplicate code https://github.com/sonic-net/DASH/issues/584-

chrispsommers commented 1 month ago

Is this supposed to run in CI? I see the runtime in CI is just a few minutes. Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

mgheorghe commented 1 month ago

Is this supposed to run in CI? I see the runtime in CI is just a few minutes. Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

here i see no test was run run sai-chalenger test 0 sec

right before submit in my branch https://github.com/mgheorghe/DASH/pull/6/checks it took 10 min for all the tests

r12f commented 1 month ago

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

r12f commented 1 month ago

Hi Mircea, I really appreciate the effort of adding the tests. However, I think a more proper place of adding this test is actually in sonic-mgmt repo.

There are a few reasons:

The CI should more focus on the correctness not speed or performance.
It is hard to assert regression happened with perf test in CI. These tests usually take long time and very sensitive to many factors, e.g., are the build machine busy doing something else. The build VMs that azure devop or github action uses are running on a shared machine, so the data will not be stable.
Moving forward, we are building DPU image and SmartSwitch virtual DUTs. These images and topologies will be used for running the sonic-mgmt tests. These tests can be scheduled as nightly runs, and we have dedicated resources for running them. It will be a much better place than the CI build machine.
Another advantage of adding this into sonic-mgmt repo is that, these tests will be naturally useable for the real devices, not just the BMv2 or any virtual/software implementation.

To summarize, I would recommend:

Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash. It will help us keep the engineering efficiency, producing data with better quality, naturally support any implementations including the real hardware and also align with the direction we are going towards.
If you don't mind the effort, I would even recommend adding the HERO tests there as well! We have this discussed internally as well, and we all think this will be an awesome thing to have!

mgheorghe commented 1 month ago

The CI should more focus on the correctness not speed or performance.

It is hard to assert regression happened with perf test in CI.

the test is focused on scale not performance, i have it set to 1 pps, goal was to exercise 32ENI's each with 10NSG's each with 1000 ACLs..... scale should be just a function of memory available.

Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash

i'll contribute them there too but, the sonic-mgmt repo will get the dash config version while BMv2 gets the SAI version of the config.

the dash repo test has as objective establishing a behavior baseline i can point each vendor to.
the sonic-mgmt version will be focused on showing sonic proper functionality
i see some value in covering also SAI since DASH config is not covering 100% what SAI can do see https://github.com/sonic-net/DASH/issues/351 for example
any option to have sai-chaleneger support in sonic-mgmt ?
is the smart switch virtual DUT available ? asking because i see https://sonic-build.azurewebsites.net/api/sonic/artifacts?branchName=master&platform=vs&target=target/sonic-vs.img.gz but also the statement The DPU SONiC KVM image with dataplane will be released at the next stage what is the difference between the 2 ?

i see the image is also based on BMv2 so all the issues i encountered with BMv2 will be there too + some more from all the sonic layers. https://github.com/sonic-net/SONiC/blob/master/images/dash/bmv2-virtual-sonic.svg

my conclusion is that i would like to add tests in both repos, i'll keep this one here and add a dash one in sonic-mgmt (it will require more work since i would like to make a proposal for an OTG virtual testbed for sonic-mgmt, maybe even adding TCP traffic)

mgheorghe commented 1 month ago

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

generated and that 100M config file was supposed to be generated on the fly in memory at runtime, but i ran into an issue with https://github.com/sonic-net/DASH/issues/584 so i had to dump the config here till that issue is resolved.

sonic-net / DASH

baby hero test scale #586

scale:

run time:

TO DO: