mtreinish / ciml

a machine learning pipeline for analyzing CI results.
Apache License 2.0
5 stars 4 forks source link

Dstat data #65

Open jlousada315 opened 4 years ago

jlousada315 commented 4 years ago

Hello !

Why did you choose to use dstat data ? why do you think statistics from your system resources is useful to predict whether a test will fail or not ? what is the intuition ?

In other cases I have found the features are more like: lines of coded added , deleted; similarity between tests (hamming distance) , pass/fail history, etc.

mtreinish commented 4 years ago

It was mostly as an experiment to test if there was a correlation between system load patterns and failures. In the openstack CI system there were some failures we always suspected were related to system load or other environment patterns (like the virtual machines or underlying hardware the test jobs are running on). I had long suspected that a class of the failures we were seeing was caused by stressing the resources in the test VM too much. We started working on ciml to see if we could leverage ml to find a correlation between failure patterns and patterns in system load. But, also more generally to see if ml could be used find any other correlation between any the large amount of data collected as part of the CI run (system logs, etc) and test failures, which might provide a hint for debugging.

jlousada315 commented 4 years ago

Hello @mtreinish , thanks for the insight it was a very helpful contextualization.

I have one question, would you consider to make that approach more broad and take into account other factors that cause tests to fail? Like the developer experience, lines of code added/deleted, use NLP to pinpoint what kind of commit was made, etc. Is there a way to extract more than dstat data from OpenStack?

afrittoli commented 4 years ago

@johnnylousas some more context: the OpenStack CI system runs two test pipelines (mostly), called check and gate. When a change is proposed, check is executed. If tests pass, and reviewers approve the change, the change is tested again, this time in the gate pipeline, and rebased on the latest master.

Since the code that is tested in gate has already been tested in check, failures in gate may be caused, like @mtreinish said, by system load or other environment patterns (like the virtual machines or underlying hardware the test jobs are running on).

We built our models on data from the gate pipeline, because failure in the check pipeline may be related to completely different type of issues, like broken code in the change.

I suspect the correlation between failures in the gate pipeline and data like LOC added/delete, commit message or so will be very low,

This said, the data is available if you want to use it. We already store metadata from the mysql DB in the CIML rawdata, under $DATA_PATH/.metadata/<run-uuid>.json.gz:

{
  "status": 0,
  "artifact": "http://logs.openstack.org/15/555915/1/gate/tempest-full/4170076",
  "build_branch": "stable/queens",
  "build_change": "555915",
  "build_master": "ze01.openstack.org",
  "build_name": "tempest-full",
  "build_node": "ubuntu-xenial",
  "build_patchset": "1",
  "build_queue": "gate",
  "build_ref": "refs/changes/15/555915/1",
  "build_short_uuid": "4170076",
  "build_uuid": "41700765bc5145288865dcbe0a4bb7aa",
  "build_zuul_url": "N/A",
  "filename": "testrepository.subunit",
  "node_provider": "ovh-bhs1",
  "project": "openstack/nova",
  "voting": "1",
  "zuul_executor": "ze01.openstack.org"
}

This data is not used today (except for the artifact link), but you could use it. With the build ref you can obtain from gerrit all the details that you may want, like commit message, change content and more.

Something we thought might be also worth doing, but never got around doing, is using NLP to process application logs from openstack. Since logs are rotated relatively quickly (because of space limitations), to do that you would need to start storing relevant log files, like we do with dstat today, and build a larger dataset over time. Another way to obtain more data for training a model could be to use logs from different test jobs, all the devstack and tempest one should be suitable for the purpose.