Reports are being produced on all data points

NickNtamp commented 1 year ago

As it is for now the reports are being produced using the whole inference dataset. For instance:

Time Period 1 -> Inference dataset 1 is added All reports are produced for Inference dataset 1 based on the scheduler. Time Period 2 -> Inference dataset 2 is added All reports are produced for Inference dataset 1 + Inference dataset 2 based on the scheduler. Time Period 3 -> Inference dataset 3 is added All reports are produced for Inference dataset 1 + Inference dataset 2 + Inference dataset 3 based on the scheduler.

This could lead to a false image of reality.

Suggested solution: The reports have to be calculated in inference batches (per granularity).

The user has to set his own granularity (for dev purposes we will use for now daily granularity->the scheduler will run once a day)
The inferences of the reports that are being produced have to be grouped per timestamp
We need a new column (flag) for the used data-points (True-False). If a data point has been used in a report it should be flagged as True.
Each time that the pipeline is triggered by the scheduler, we should check the inference rows that have False values. For the reports that fall under these inferences' timestamps (with the False values), they should be re-calculated including both the True and False inferences. Then the False inferences will be updated to True.

stavrostheocharis commented 1 year ago

@momegas I tried to change the estimation but I am not allowed

momegas commented 1 year ago

I changed it. It should be ok now. I added you to the project

squaredev-io / whitebox

Reports are being produced on all data points #126