feat: kpi forecasting add funnel_forecast unit tests

jaredsnyder commented 3 months ago

Adds unit tests for funnel_forecast and does some light refactoring

Checklist for reviewer:

[ ] Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title)
[ ] Scan the PR and verify that no changes (particularly to .circleci/config.yml) will cause environment variables (particularly credentials) to be exposed in test logs
[ ] Ensure the container image will be using permissions granted to telemetry-airflow responsibly.

jaredsnyder commented 3 months ago

Made a notebook to compare outputs. For most of the rows (where the optimization lands on the same parameters), the difference is within 1% but there are some significant outliers. Not sure if this is inevitable due to the stochastic nature of the prediction or not. Notably we didn't have this issue with prophet_forecast

https://colab.research.google.com/drive/1NxD5eVwZ0Vw3UhZtZoIev5Q44qEEjSTJ#scrollTo=CjguJpSKsv7k

jaredsnyder commented 3 months ago

The seed is set in both branches, so the results should be the same. Investigating

jaredsnyder commented 3 months ago

Validation NB is updated and all tests pass. Two changes were necessary:

A bug we fixed was switching the values in _add_regressor so 1 was assigned when true and 0 when False, while in main it is the other way around. This did change the values, usually by small amounts but occasionally by more. It may be worth revisiting whether or not this change is worth it. To check, I reverted this change to produce the _branch_0s tables
there was another bug involving using the aggregate function on strings. I updated the function and added some tests.

m-d-bowerman commented 3 months ago

Validation NB is updated and all tests pass. Two changes were necessary:

* A bug we fixed was switching the values in `_add_regressor` so 1 was assigned when true and 0 when False, while in main it is the other way around.  This did change the values, usually by small amounts but occasionally by more.  It may be worth revisiting whether or not this change is worth it.  To check, I reverted this change to produce the `_branch_0s` tables

* there was another bug involving using the aggregate function on strings.  I updated the function and added some tests.

@jaredsnyder I added a bit to the notebook, checking for differences between values, and all differences were 0. The lil bit is right before the components section; have a look, make sure I didn't goof up the merge or sets I used?

jaredsnyder commented 3 months ago

Validation NB is updated and all tests pass. Two changes were necessary:
* A bug we fixed was switching the values in `_add_regressor` so 1 was assigned when true and 0 when False, while in main it is the other way around.  This did change the values, usually by small amounts but occasionally by more.  It may be worth revisiting whether or not this change is worth it.  To check, I reverted this change to produce the `_branch_0s` tables

* there was another bug involving using the aggregate function on strings.  I updated the function and added some tests.
@jaredsnyder I added a bit to the notebook, checking for differences between values, and all differences were 0. The lil bit is right before the components section; have a look, make sure I didn't goof up the merge or sets I used?

@m-d-bowerman : that's because the tables you were looking at ran the data with that change reverted. I loaded the tables with the change and made some plots to compare the differences here https://colab.research.google.com/drive/1NxD5eVwZ0Vw3UhZtZoIev5Q44qEEjSTJ#scrollTo=Look_at_distribution_of_differences_when_changing_1_to_0

m-d-bowerman commented 3 months ago

Updates look good :shipit: 🦜

mozilla / docker-etl

feat: kpi forecasting add funnel_forecast unit tests #248