Handling missing values in calculating accuracy metrics

harish-seetamraju / yum-kfc

We'll use this to track entire project at a task level to assign work, prioritization and track progress. All business decisions, risks and issues should also be recorded here

0 stars 0 forks source link

Handling missing values in calculating accuracy metrics #14

Open harish-seetamraju opened 4 years ago

harish-seetamraju commented 4 years ago

All discussion and decisions related to handling one or more missing values among actuals, tiger and scafs forecasts will be recorded here

harish-seetamraju commented 4 years ago

@SwethaSridhar3396 @karthikvenkatak Pls update your numbers here, for each of the model run dates that we have.

harish-seetamraju commented 4 years ago

@SwethaSridhar3396 @karthikvenkatak This is the logic we'll use for exclusions. I went ahead and pulled the data for a few runs and updated the % contribution of each case, for item forecasts. The query is below. Feel free to double check the script.

select fct.chned_str_id , act.chned_str_id, fct.busidaydt,act.busidaydt, fct.prodcd, act.prodcd, fct.forecasted_value,act.actual_value select sum(abs(coalesce (fct.forecasted_value,0)- coalesce(act.actual_value,0)))/sum(coalesce(act.actual_value,0))100 from (select from stg_fct_items_daily_forecast where last_date_model='02-15-2020' and (busidaydt,chned_str_id) in (select distinct busidaydt,chned_str_id from stg_fct_items_daily_actual)) fct inner join (select * from stg_fct_items_daily_actual where (busidaydt,chned_str_id) in (select distinct busidaydt,chned_str_id from stg_fct_items_daily_forecast where last_date_model='02-15-2020')) act on fct.chned_str_id=act.chned_str_id and fct.busidaydt=act.busidaydt and fct.prodcd=act.prodcd

harish-seetamraju commented 4 years ago

karthikvenkatak commented 4 years ago

@harish-seetamraju @vinayTA , For cases where Actuals and TIger are present and SCAFS is absent: This is happening for stores that underwent realignment. We are handling this by taking the stores at rest id level and them joining it with chain store id. But scars data is not updated with the latest chain store id. FOr example, for model run on 1st march, we will not have scars data for all 37 stores that underwent realignment on 9th march. Even for models run post 9th I am seeing scars data missing for few of the stores that underwent realignment. Please advise how we should handle this.

harish-seetamraju commented 4 years ago

@karthikvenkatak @SwethaSridhar3396 Pls get the same numbers I calculated at parent prod code level. So that we can take a decision on inclusion vs. exclusion on those numbers. For the current dashboard, pls include everything, which means the errors will be higher

harish-seetamraju commented 4 years ago

@vinayTA @gauthammn For item forecasts, to avoid all corner cases, the store day prodcd combinations where either forecasts or actuals are missing are excluded from error calculation.

Similarly, for sales and transaction forecasts, rows where either tiger or scafs forecast is missing are excluded from error calculation.

Hope we all agree with this

karthikvenkatak commented 4 years ago

@harish-seetamraju @gauthammn For 15 min sales and transaction tables, I am imputing actuals as 0 when we do not have data for certain time buckets. For example, if tiger forecasts $30 at 17:15 (time bucket) and the corresponding row for actuals (at 17:15) is missing, then I am imputing actuals as 0. Please confirm if the logic is correct.

gauthammn commented 4 years ago

I think this logic is correct.