hdt-modeling / hdt-forecast

0 stars 0 forks source link

Data missing problem #30

Open ZeratuuLL opened 3 years ago

ZeratuuLL commented 3 years ago

In some test cases I noticed that there is missing values in the middle of training data. Say if we want to train the model with data between March 1st to Oct 31st. The data might be missing between Oct 1st to Oct 8th (a week for example). We should deal with this and there are two possible solutions that I can think of:

  1. Make the model robust enough to realize and deal with this issue itself
  2. Find a way to impute the number of cases for these missing values if the missing proportion is not larger than some threshold (like 20%). I am thinking about using a quadratic function or exponential function
ZeratuuLL commented 3 years ago

This will also harm the result of weekday effects. In the code to smooth data with weekday effects, it's taken for granted that data is continuous.

ZeratuuLL commented 3 years ago

Case data does not suffer from such issue (at least for what I've found). All know missing data issue is due to the missing of mobility or leading indicators. We need imputation methods for leading indicators