Open gaylatea opened 2 years ago
Agreed! Even better would be if these nightly runs lined up with the nightly releases so the numbers reflect the performance of the nightly release.
Right now soak tests are designed to be used as a regression test (between 2 specific commits, and not between separate runs). It currently has hyper-threading disabled, and only uses half of the physical cores of the machine to keep it consistent. If you want real-world numbers that you can share, you probably want to use a config that real customers would use.
Instead of running a comparative analysis of baseline/candidate branches, this can just run the baseline against the current master branch.
Like @fuchsnj alludes to in his comment the soak analysis step does very much assume that you're wanting to make a comparison between two SHAs, however it is you get those. I'm not totally sure what you mean here, but if I'm understanding correctly you'd like to run one side of a soak and get its descriptive statistics?
Agreed! Even better would be if these nightly runs lined up with the nightly releases so the numbers reflect the performance of the nightly release.
@jszwedko is also assuming our current two-SHA comparison model, so it's probably worth being clear on this point, unless I'm reading his comment wrong.
These numbers can then be dumped into wherever we keep results. Ideally, this would be a Datadog dashboard where we can display a Query value with throughput values. History is less important than ease of access for folks writing docs/sales pitches.
I would caution that the soak tests have no valid history. That is, the experimental setup is allowed to vary at any time in a way that breaks comparison with previously calculated results. So, it's a big no-no to store these and compare them up over time. But, if your storage allows for some concept of only the last data set being valid then this is okay.
Even better would be if these nightly runs lined up with the nightly releases so the numbers reflect the performance of the nightly release.
That's a brilliant idea, I love it.
If you want real-world numbers that you can share, you probably want to use a config that real customers would use.
Yeah, I think that's reasonable to do. My gut says that we can take our current deployment recommendation (c6g.4xlarge
instances) and run these nightly tests on that, exactly as a customer would. Not sure if the x86_64 / ARM difference there means we should change that, though...
[...] if I'm understanding correctly you'd like to run one side of a soak and get its descriptive statistics?
Exactly correct.
I would caution that the soak tests have no valid history. That is, the experimental setup is allowed to vary at any time in a way that breaks comparison with previously calculated results. So, it's a big no-no to store these and compare them up over time. But, if your storage allows for some concept of only the last data set being valid then this is okay.
Yeah, that's part of why I picked the Query Value widget for that display. It only displays a single value, which we'll configure as the most recent data point to come in. That'll avoid the issue you've mentioned there.
Even better would be if these nightly runs lined up with the nightly releases so the numbers reflect the performance of the nightly release.
That's a brilliant idea, I love it.
If you want real-world numbers that you can share, you probably want to use a config that real customers would use.
Yeah, I think that's reasonable to do. My gut says that we can take our current deployment recommendation (
c6g.4xlarge
instances) and run these nightly tests on that, exactly as a customer would. Not sure if the x86_64 / ARM difference there means we should change that, though...
We can measure this and find out. #10454 will cover this.
[...] if I'm understanding correctly you'd like to run one side of a soak and get its descriptive statistics?
Exactly correct.
Excellent. We don't have the analysis for this today but the pattern exists and should be trivial.
I would caution that the soak tests have no valid history. That is, the experimental setup is allowed to vary at any time in a way that breaks comparison with previously calculated results. So, it's a big no-no to store these and compare them up over time. But, if your storage allows for some concept of only the last data set being valid then this is okay.
Yeah, that's part of why I picked the Query Value widget for that display. It only displays a single value, which we'll configure as the most recent data point to come in. That'll avoid the issue you've mentioned there.
Cool. Always worth repeating, since it's a non-obvious complication with this data set.
Thinking on it, considering the goal is to make descriptive statistics about a single SHA, the approach we'll take here is likely very similar to the one we'll take with #10606. Detection of an erratic soak is about the behavior of a single SHA and comparison between runs of a soak, not between SHAs.
We'd like to establish some notion of performance numbers that can be given as examples to customers. Right now, the soak tests framework only runs when PRs are submitted, which makes collection of this data erratic and not necessarily up-to-date with the current deployed state of the system.
Github Actions can be commanded to run the soak test framework on a schedule, at midnight. Instead of running a comparative analysis of baseline/candidate branches, this can just run the baseline against the current
master
branch.These numbers can then be dumped into wherever we keep results. Ideally, this would be a Datadog dashboard where we can display a Query value with throughput values. History is less important than ease of access for folks writing docs/sales pitches.