octoenergy / timeserio

Better `keras` models for time series and beyond

MIT License

61 stars 16 forks source link

Add a `refit` option to LagFeaturizer to allow pre-fitting lags #18

Closed ig248 closed 4 years ago

ig248 commented 4 years ago

Before this change, a LagFeaturizer would always respect causality and avoid data leakage (good). This means however that on-the-fly validation with "safe" features such as historic temperature lags could not be performed. After this change, the LagFeaturizer can be explicitly pre-fitted on all available data for the feature in question, and set to "frozen" using e.g. set_params(refit=False). This advanced use case requires extra care, but the change is introduced without changing default behaviour.

ig248 commented 4 years ago

@ali-tny Indeed - if I use it as before, the validation predict will only be able to compute lags against timesteps seen during fit. This is "safe", but it means all lag values will be nan. And even worse, same issues will appear at training time if using short (or non-temporal) batches. This is really a hack in place of manually adding lag columns to the input data. Importantly, I am only using it to lag features, not to lag the target - though there might be uses.

codecov-commenter commented 4 years ago

Codecov Report

Merging #18 into master will increase coverage by 0.02%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #18      +/-   ##
==========================================
+ Coverage   88.80%   88.82%   +0.02%     
==========================================
  Files          34       34              
  Lines        1662     1665       +3     
==========================================
+ Hits         1476     1479       +3     
  Misses        186      186

Impacted Files	Coverage Δ
timeserio/preprocessing/datetime.py	`96.96% <100.00%> (+0.07%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 20c0013...8138201. Read the comment docs.

ali-tny commented 4 years ago

FYI I added a test for the refit=False behaviour into your commit