Implement SKLearn interface

jaredsnyder commented 2 months ago

Changes:

data pull via metric club removed from all classes
start_date and end_date attributes removed from all classes, time filtering will now happen in kpi_forecasting.py before passing data to model classes
New class, BaseEnsembleForecast, created to deal with segmented models like FunnelForecast used to implement
New class, ProphetAutotunerForecast, created to implement automated hyperparameter tuning
FunnelForecast recreated as a BaseEnsembleForecast that uses a ProphetAutotunerForecast as the base model
summarize and write_functions, along with all the functions called within them, moved outside of classes

Checklist for reviewer:

[ ] Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title)
[ ] Scan the PR and verify that no changes (particularly to .circleci/config.yml) will cause environment variables (particularly credentials) to be exposed in test logs
[ ] Ensure the container image will be using permissions granted to telemetry-airflow responsibly.

jaredsnyder commented 2 months ago

Here's a notebook to validate the PR. We're not getting an exact match on the search forecasts but @m-d-bowerman and I have concluded the models match and the difference is due to how prophet sets the seed: https://colab.research.google.com/drive/1dLeLUz_99ln9PC1AG-izZILj9-zIJHmJ#scrollTo=70upJ3eUTvkh

jaredsnyder commented 2 months ago

Note on the validation: https://docs.google.com/document/d/1kG75iCFHSxBYVz6EcaYhozOZ9KfK7ncKvB5YfOmaB6I/edit?usp=sharing

jaredsnyder commented 1 month ago

WRT code complexity: Yeah that is the definite downside to trying to "promote" models with segments so they'd be easier to use. I can take another pass at documenting/commenting so it's easier to work with, and can brainstorm ways to clean it up. We could also meet to try and come up with something if you think that'd be useful

jaredsnyder commented 1 month ago

Another thing I want to look into is trying to use DARTs (https://unit8co.github.io/darts/) which might eliminate a lot of the wrapper code around prophet, and maybe some of the stuff for handling data too

bochocki commented 1 month ago

Darts does look neat! I tried to evaluate it as part of the KPI model selection exercise that we used to decide on prophet, but at the time they didn't have M1 support and that was enough of a blocker for local development that I didn't explore it further.

mozilla / docker-etl

Implement SKLearn interface #272