mtreinish / ciml

a machine learning pipeline for analyzing CI results.
Apache License 2.0
5 stars 4 forks source link

Improve dataset building #2

Closed afrittoli closed 6 years ago

afrittoli commented 6 years ago

We only use the db_trainer for dataset building, so renaming it, and changing it to use a new gather_data module function that loops through runs re-using a single DB session.

Store on disk under .unavailable the ID of runs for which no dstat data is available to download (nor cached), so that we don't check them everytime using an expensive HTTP call.

Handle CSV parsing issues, delete a cache file when they come from a local cache, since this usally means that the local cache is corrupt).

mtreinish commented 6 years ago

Sigh, I hate github apparently the typo was removed in later patches anyway so no big deal.