How is confidence built that tests are done correctly?

I was thinking about how Wakurtosis can broadcast that the tests it is performing are correct, and where that "documentation" of sorts should live.

This comes from a variety of sources and I'm not sure that any of it can be discovered naturally from an outside source. This needs to change if we are to think about Wakurtosis as a platform for testing distributed systems. Some thoughts...

From a software development level, there needs to be standard testing framework, like unit tests to ensure that the code does what it is expected of it. I'm not sure if any of this is in place, if so, let's add it to the README, and define a process of contributions to ensure that code coverage is always up to par.

From an analysis perspective, this is somewhat subjective to the study being done. Some initial help here is publishing and referencing some of the work that has already been done when testing the performance differences between machine/docker/kurtosis. Furthermore, we should state somewhere the assumptions being made around the network topology, what a measurement is, etc. Whatever is relevant, it should be written down in the repo so that others who use this software understand more clearly.

Analysis papers can then reference it in the associated results publications, and they can be updated as their improved over time.

please let me know what's being done here, and thoughts around a process moving forward to ensure that quality (both code and analysis results) stays at a high level and we can be confident that the work we're putting out is as accurate as possible.

Hi, let me answer from my point of view by parts:

I was thinking about how Wakurtosis can broadcast that the tests it is performing are correct, and where that "documentation" of sorts should live. This comes from a variety of sources and I'm not sure that any of it can be discovered naturally from an outside source. This needs to change if we are to think about Wakurtosis as a platform for testing distributed systems. Some thoughts...

This is quite hard to answer. I guess it depends on whether if we keep using Kurtosis in the future or not. Obviously for this repository I think we should have everything well documented (workflow diagrams, how-to-uses, and so on) where everything is explained. There is already a Readme in the main core, and Gennet and WLS should also have more complete Readmes. I guess this would be the approach for now...

From a software development level, there needs to be standard testing framework, like unit tests to ensure that the code does what it is expected of it. I'm not sure if any of this is in place, if so, let's add it to the README, and define a process of contributions to ensure that code coverage is always up to par.

Personally I strongly agree with this. When it comes to Starlark/kurtosis, there are already tests (that could still be improved). Have to say this is not documented in the Readme, so I will do it later. Another problem with this is that afaik there is no way to get code coverage for the starlark part of wakurtosis (maybe @bacv could say something about this?)

Then, I did some tests for WLS, and ganesh is Currently doing tests for Gennet. I think we whould have some kind of CI in this repo, to ensure that every time changes are done we don't break up anything. Personally I would also sugest that the analysis module or any other functionality should be in another Docker image, so it is more reusable.

From an analysis perspective, this is somewhat subjective to the study being done. Some initial help here is publishing and referencing some of the work that has already been done when testing the performance differences between machine/docker/kurtosis. Furthermore, we should state somewhere the assumptions being made around the network topology, what a measurement is, etc. Whatever is relevant, it should be written down in the repo so that others who use this software understand more clearly.

Ganesh has those results. Maybe this information should be explained and shown in Wadoku repo? Since you have both code and results in the same place. I think this results are important and should be well explained, because at the end we are justifying that Kurtosis adds almost no overhead.

Analysis papers can then reference it in the associated results publications, and they can be updated as their improved over time.

please let me know what's being done here, and thoughts around a process moving forward to ensure that quality (both code and analysis results) stays at a high level and we can be confident that the work we're putting out is as accurate as possible.

I would always personally push towards documentation and testing so we can assure (more confidently) that our results are what they are. This comes in different ways, either Readmes and unittesting, and also comparing metrics of cAdvisor with docker stats and other sources to check that the results are similar. This is also another very important point.

What do you think @Daimakaimura @0xFugue

Great points both of you. We're definitely aware of the challenges in scaling with Kurtosis and Docker, and we've been working on a few things to make sure Wakurtosis stays reliable and transparent. In addition to what @AlbertoSoutullo has already mentioned and a bit more generally.

So far, our main focus has been on functionality, but we hope to shift our focus to usability sooner rather than later.

For the software development side, we're implementing a testing framework across all modules. We'll make sure to add this info to the README. At the moment testing is limited.

Regarding the analysis perspective, we're planning to publish and reference some of the work already done while testing the performance differences between machine/docker/kurtosis. We'll also be documenting assumptions, network topology, and any other relevant info right in the repo for better clarity. Additionally, we'll include detailed documentation on how we're capturing data and performing the analysis, as well as explain the challenges, the solutions, limitations and future scaling strategies.

As @AlbertoSoutullo mentions, the basic set of unit tests for WSL and Wakurtosis exists. Gennet doesn't have unit tests, yet. But I do run a battery of integration tests before merging to master, and that does help. I will add a set of unit tests to Gennet as well.

Code coverage is a hard thing to enforce. I suggest we do unit, integration tests first and then move towards code coverage.

The repo does indeed need a fair amount of documentation and explaining, and collation of existing results. But as @Daimakaimura recommends, it is probably best if we have a fully functional system that gives out accurate, useful, actionable data first. Currently we really need to sort out: 0) the sanity of the runs, 1) the sanity of the data collected, 2) the analysis 4) package it all in portable way --- in that order.

Once that's done, we could shape up documentation: from capabilities to usage to design choices to results. I personally think the best thing to do is to add a technical report in portable format that allows pictures (aka PDF!) and link it to the repo.

vacp2p / wakurtosis

How is confidence built that tests are done correctly? #101