sonic-net / DASH

Disaggregated APIs for SONiC Hosts
Apache License 2.0
77 stars 88 forks source link

baby hero test scale #586

Open mgheorghe opened 1 month ago

mgheorghe commented 1 month ago

Test SAI Baby Hero. Baby Hero test scale is 1% of the Hero test scale. This is achieved by having only one prefix per ACL instead of 100 prefixes per ACL.

had to increase the BMv2 scale numbers to 8M and 4M from default 1K for mapping and routing tables

scale:

run time:

TO DO:

chrispsommers commented 1 month ago

Is this supposed to run in CI? I see the runtime in CI is just a few minutes. Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

mgheorghe commented 1 month ago

Is this supposed to run in CI? I see the runtime in CI is just a few minutes. Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

here i see no test was run run sai-chalenger test 0 sec

right before submit in my branch https://github.com/mgheorghe/DASH/pull/6/checks it took 10 min for all the tests

r12f commented 1 month ago

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

r12f commented 1 month ago

Hi Mircea, I really appreciate the effort of adding the tests. However, I think a more proper place of adding this test is actually in sonic-mgmt repo.

There are a few reasons:

  1. The CI should more focus on the correctness not speed or performance.
  2. It is hard to assert regression happened with perf test in CI. These tests usually take long time and very sensitive to many factors, e.g., are the build machine busy doing something else. The build VMs that azure devop or github action uses are running on a shared machine, so the data will not be stable.
  3. Moving forward, we are building DPU image and SmartSwitch virtual DUTs. These images and topologies will be used for running the sonic-mgmt tests. These tests can be scheduled as nightly runs, and we have dedicated resources for running them. It will be a much better place than the CI build machine.
  4. Another advantage of adding this into sonic-mgmt repo is that, these tests will be naturally useable for the real devices, not just the BMv2 or any virtual/software implementation.

To summarize, I would recommend:

  1. Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash. It will help us keep the engineering efficiency, producing data with better quality, naturally support any implementations including the real hardware and also align with the direction we are going towards.
  2. If you don't mind the effort, I would even recommend adding the HERO tests there as well! We have this discussed internally as well, and we all think this will be an awesome thing to have!
mgheorghe commented 1 month ago
  1. The CI should more focus on the correctness not speed or performance.
  2. It is hard to assert regression happened with perf test in CI.

the test is focused on scale not performance, i have it set to 1 pps, goal was to exercise 32ENI's each with 10NSG's each with 1000 ACLs..... scale should be just a function of memory available.

Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash

i'll contribute them there too but, the sonic-mgmt repo will get the dash config version while BMv2 gets the SAI version of the config.

i see the image is also based on BMv2 so all the issues i encountered with BMv2 will be there too + some more from all the sonic layers. https://github.com/sonic-net/SONiC/blob/master/images/dash/bmv2-virtual-sonic.svg

my conclusion is that i would like to add tests in both repos, i'll keep this one here and add a dash one in sonic-mgmt (it will require more work since i would like to make a proposal for an OTG virtual testbed for sonic-mgmt, maybe even adding TCP traffic)

mgheorghe commented 1 month ago

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

generated and that 100M config file was supposed to be generated on the fly in memory at runtime, but i ran into an issue with https://github.com/sonic-net/DASH/issues/584 so i had to dump the config here till that issue is resolved.