mozilla / docker-etl

Collection of dockerized ETL jobs managed by data engineering.
Mozilla Public License 2.0
19 stars 15 forks source link

feat: kpi forecasting add funnel_forecast unit tests #248

Closed jaredsnyder closed 3 months ago

jaredsnyder commented 3 months ago

Adds unit tests for funnel_forecast and does some light refactoring

Checklist for reviewer:

jaredsnyder commented 3 months ago

Made a notebook to compare outputs. For most of the rows (where the optimization lands on the same parameters), the difference is within 1% but there are some significant outliers. Not sure if this is inevitable due to the stochastic nature of the prediction or not. Notably we didn't have this issue with prophet_forecast

https://colab.research.google.com/drive/1NxD5eVwZ0Vw3UhZtZoIev5Q44qEEjSTJ#scrollTo=CjguJpSKsv7k

jaredsnyder commented 3 months ago

The seed is set in both branches, so the results should be the same. Investigating

jaredsnyder commented 3 months ago

Validation NB is updated and all tests pass. Two changes were necessary:

m-d-bowerman commented 3 months ago

Validation NB is updated and all tests pass. Two changes were necessary:

* A bug we fixed was switching the values in `_add_regressor` so 1 was assigned when true and 0 when False, while in main it is the other way around.  This did change the values, usually by small amounts but occasionally by more.  It may be worth revisiting whether or not this change is worth it.  To check, I reverted this change to produce the `_branch_0s` tables

* there was another bug involving using the aggregate function on strings.  I updated the function and added some tests.

@jaredsnyder I added a bit to the notebook, checking for differences between values, and all differences were 0. The lil bit is right before the components section; have a look, make sure I didn't goof up the merge or sets I used?

jaredsnyder commented 3 months ago

Validation NB is updated and all tests pass. Two changes were necessary:

* A bug we fixed was switching the values in `_add_regressor` so 1 was assigned when true and 0 when False, while in main it is the other way around.  This did change the values, usually by small amounts but occasionally by more.  It may be worth revisiting whether or not this change is worth it.  To check, I reverted this change to produce the `_branch_0s` tables

* there was another bug involving using the aggregate function on strings.  I updated the function and added some tests.

@jaredsnyder I added a bit to the notebook, checking for differences between values, and all differences were 0. The lil bit is right before the components section; have a look, make sure I didn't goof up the merge or sets I used?

@m-d-bowerman : that's because the tables you were looking at ran the data with that change reverted. I loaded the tables with the change and made some plots to compare the differences here https://colab.research.google.com/drive/1NxD5eVwZ0Vw3UhZtZoIev5Q44qEEjSTJ#scrollTo=Look_at_distribution_of_differences_when_changing_1_to_0

m-d-bowerman commented 3 months ago

Updates look good :shipit: 🦜