Fix daily average when data points are missing

robinlavallee commented 4 years ago

When data points are missing, the average is computed incorrectly. For example: https://www.streamcamel.com/?chartFrom=2020-01-12T08:37:05.348Z&chartTo=2020-01-16T20:13:20.538Z

At no given time, did the number of viewers went below 700k for Jan, 13, 2020. However, the daily average for Jan, 13 shows 384k average viewers https://www.streamcamel.com/?chartFrom=2020-01-01T06:00:00.000Z&chartTo=2020-06-13T21:08:06.178Z

This is likely caused because the code assume the number of data points is always 144 (number of 10 minutes range in a day).

Fix the code so that it divides by the actual number of points found. Related to #55

robinlavallee commented 4 years ago

Ran updated roll up script which uses number of samples, average is now looking good instead of dumb

robinlavallee commented 4 years ago

This is a catch-22 problem. If our sampling is down, then, we should interpolate the missing data (i.e., inject data into the DB assuming previous week trends). By limiting to the # of samples, we are incorrectly calculating the average of im-popular games because it gets skewed up due to missing samples over certain time.

For example, let's say a game is only viewed by 10 viewers at noon, and never the rest of the day. Then https://github.com/robinlavallee/scratcher/commit/558858cbb7e1890c448c2a9a0a10cee399758814#diff-2576cd2623a4b5ea7fb96f8b5e9893d6 has broken the average as it will stay the game as a "10 viewers daily average". That is incorrect, the game has only 0.0694 viewers on average and a daily peak of 10 viewers.

Suggested actions:

Revert https://github.com/robinlavallee/scratcher/commit/558858cbb7e1890c448c2a9a0a10cee399758814#diff-2576cd2623a4b5ea7fb96f8b5e9893d6
Insert fake data points in streams_clean table during down-time, using trend or average from last week. See https://github.com/robinlavallee/torso/issues/55
Rerun the rollup.

streamcamel / torso

Fix daily average when data points are missing #72