Open jlousada315 opened 4 years ago
It was mostly as an experiment to test if there was a correlation between system load patterns and failures. In the openstack CI system there were some failures we always suspected were related to system load or other environment patterns (like the virtual machines or underlying hardware the test jobs are running on). I had long suspected that a class of the failures we were seeing was caused by stressing the resources in the test VM too much. We started working on ciml to see if we could leverage ml to find a correlation between failure patterns and patterns in system load. But, also more generally to see if ml could be used find any other correlation between any the large amount of data collected as part of the CI run (system logs, etc) and test failures, which might provide a hint for debugging.
Hello @mtreinish , thanks for the insight it was a very helpful contextualization.
I have one question, would you consider to make that approach more broad and take into account other factors that cause tests to fail? Like the developer experience, lines of code added/deleted, use NLP to pinpoint what kind of commit was made, etc. Is there a way to extract more than dstat data from OpenStack?
@johnnylousas some more context: the OpenStack CI system runs two test pipelines (mostly), called check
and gate
. When a change is proposed, check
is executed. If tests pass, and reviewers approve the change, the change is tested again, this time in the gate
pipeline, and rebased on the latest master.
Since the code that is tested in gate
has already been tested in check
, failures in gate
may be caused, like @mtreinish said, by system load or other environment patterns (like the virtual machines or underlying hardware the test jobs are running on).
We built our models on data from the gate
pipeline, because failure in the check
pipeline may be related to completely different type of issues, like broken code in the change.
I suspect the correlation between failures in the gate
pipeline and data like LOC added/delete, commit message or so will be very low,
This said, the data is available if you want to use it. We already store metadata from the mysql DB in the CIML rawdata, under $DATA_PATH/.metadata/<run-uuid>.json.gz
:
{
"status": 0,
"artifact": "http://logs.openstack.org/15/555915/1/gate/tempest-full/4170076",
"build_branch": "stable/queens",
"build_change": "555915",
"build_master": "ze01.openstack.org",
"build_name": "tempest-full",
"build_node": "ubuntu-xenial",
"build_patchset": "1",
"build_queue": "gate",
"build_ref": "refs/changes/15/555915/1",
"build_short_uuid": "4170076",
"build_uuid": "41700765bc5145288865dcbe0a4bb7aa",
"build_zuul_url": "N/A",
"filename": "testrepository.subunit",
"node_provider": "ovh-bhs1",
"project": "openstack/nova",
"voting": "1",
"zuul_executor": "ze01.openstack.org"
}
This data is not used today (except for the artifact link), but you could use it. With the build ref you can obtain from gerrit all the details that you may want, like commit message, change content and more.
Something we thought might be also worth doing, but never got around doing, is using NLP to process application logs from openstack. Since logs are rotated relatively quickly (because of space limitations), to do that you would need to start storing relevant log files, like we do with dstat
today, and build a larger dataset over time.
Another way to obtain more data for training a model could be to use logs from different test jobs, all the devstack
and tempest
one should be suitable for the purpose.
Hello !
Why did you choose to use dstat data ? why do you think statistics from your system resources is useful to predict whether a test will fail or not ? what is the intuition ?
In other cases I have found the features are more like: lines of coded added , deleted; similarity between tests (hamming distance) , pass/fail history, etc.