quintel / mechanical_turk

Automatic tester for etengine
1 stars 0 forks source link

Add spec checking dashboard values remain largely constant #157

Open michieldenhaan opened 3 years ago

michieldenhaan commented 3 years ago

We should add a spec to check for a couple of prominent scenarios whether the dashboard KPIs remain largely stable over time. Dashboard values are likely to change a couple of percentage points due to model/data updates, which is fine. This spec is intended to catch large changes, which are an indication that something may be off.

mabijkerk commented 6 months ago

The questions is whether this should be put in Mechanical Turk. As discussed with @kaskranenburgQ there are a few options:

  1. Create an MT test: this would probably require storing the results of x queries of y scenarios in the scenario_collection for a day, then checking the deltas the next day and returning a failure when softly_equal or roughly_equal is not met. This would run daily.
  2. Create a custom test: this might be part of the scenario tools or part of another spec. Can be used to compare branch/pro/beta with branch/pro/beta. This would run manually whenever a modeller wants to merge a PR.
  3. Create a Semaphore check: similar to option 2 but as an integrated part of Semaphore checks for PRs. Probably only relevant to add to ETEngine/ETSource. Should not be a blocking spec as deltas may be expected in some cases. Compares the branch to merge with branch to merge in (master). This would run whenever a modeller makes a PR.

Our preference goes to option 3. We will discussion this in the teammeeting.

mabijkerk commented 6 months ago

Following the discussion in the teammeeting. Option 3 seems like the best option for the long term, but will take some effort to set up (requires comparing two ETEngine & ETSource versions in your spec, so a new application within Semaphore is required). @kaskranenburgQ let's keep this idea in scope for future reference.

For now, option 2, where we build a ready-to-go tool on/within/using the scenario tools, seems the most feasible option. @kndehaan can pick this up when she has time. Could you open an issue for this op scenario-tools?

kndehaan commented 5 months ago

Issue opened on scenario-tools https://github.com/quintel/scenario-tools/issues/38