Closed jaredsnyder closed 1 month ago
Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the predict_historical_dates
parameter so that we wouldn't have to overwite dates_to_predict
in the __post_init__
. I would like to use predict_historical_dates
to modify the start_date
attribute via _default_start_date
rather than overwriting dates_to_predict
. The only place in FunnelForecast I can see this causing problems is here:
What is the purpose of this line? Also what do you think of the general approach?
Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the
predict_historical_dates
parameter so that we wouldn't have to overwitedates_to_predict
in the__post_init__
. I would like to usepredict_historical_dates
to modify thestart_date
attribute via_default_start_date
rather than overwritingdates_to_predict
. The only place in FunnelForecast I can see this causing problems is here:What is the purpose of this line? Also what do you think of the general approach?
In the FunnelForecast
class, we also capture the intra-time series predictions, along with the Prophet components used to make that prediction, done here:
https://github.com/mozilla/docker-etl/blob/2f4b0231eda133133daaa5134d20de40152ca6de/jobs/kpi-forecasting/kpi_forecasting/models/funnel_forecast.py#L400
That's the motivation behind overwriting dates_to_predict
, while retaining the start_date
param to trim down the forecast_df
to only dates in the future. I'll review the changes you made to see if we can remove the repeated application of that condition; I agree, it's incredibly clunky.
Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the
predict_historical_dates
parameter so that we wouldn't have to overwitedates_to_predict
in the__post_init__
. I would like to usepredict_historical_dates
to modify thestart_date
attribute via_default_start_date
rather than overwritingdates_to_predict
. The only place in FunnelForecast I can see this causing problems is here: https://github.com/mozilla/docker-etl/blob/fd1435b74b2dee6edbdb55faafb5fc670b34c138/jobs/kpi-forecasting/kpi_forecasting/models/funnel_forecast.py#L494What is the purpose of this line? Also what do you think of the general approach?
In the
FunnelForecast
class, we also capture the intra-time series predictions, along with the Prophet components used to make that prediction, done here:That's the motivation behind overwriting
dates_to_predict
, while retaining thestart_date
param to trim down theforecast_df
to only dates in the future. I'll review the changes you made to see if we can remove the repeated application of that condition; I agree, it's incredibly clunky.
Given that it gets filtered to dates in the future using self.start_date
how is this different than just having dates_to_predict
start at self.start_date
?
Question for @m-d-bowerman on this one: I'm trying to clean up/document all the start dates. As part of that I added the
predict_historical_dates
parameter so that we wouldn't have to overwitedates_to_predict
in the__post_init__
. I would like to usepredict_historical_dates
to modify thestart_date
attribute via_default_start_date
rather than overwritingdates_to_predict
. The only place in FunnelForecast I can see this causing problems is here: https://github.com/mozilla/docker-etl/blob/fd1435b74b2dee6edbdb55faafb5fc670b34c138/jobs/kpi-forecasting/kpi_forecasting/models/funnel_forecast.py#L494What is the purpose of this line? Also what do you think of the general approach?
In the
FunnelForecast
class, we also capture the intra-time series predictions, along with the Prophet components used to make that prediction, done here: https://github.com/mozilla/docker-etl/blob/2f4b0231eda133133daaa5134d20de40152ca6de/jobs/kpi-forecasting/kpi_forecasting/models/funnel_forecast.py#L400That's the motivation behind overwriting
dates_to_predict
, while retaining thestart_date
param to trim down theforecast_df
to only dates in the future. I'll review the changes you made to see if we can remove the repeated application of that condition; I agree, it's incredibly clunky.Given that it gets filtered to dates in the future using
self.start_date
how is this different than just havingdates_to_predict
start atself.start_date
?
The forecast dates get filtered that way, yeah, but the components table doesn't -- it'll have a row for every date in the observed data and the forecast dates.
I'm going to leave dealing with start_date
for the next PR where I generalize the use of SegmentSettings objects. So this one is really for a full review
Created a notebook to validate that the outputs are the same. Ended up finding an error that was corrected in the latest commit, but now everything matches
https://colab.research.google.com/drive/1dLeLUz_99ln9PC1AG-izZILj9-zIJHmJ#scrollTo=oukMs3d3qiEJ
This PR modifies the config files, especially for the FunnelForecast jobs, to bring them more closely into alignment. This is a first step toward adding the ability to fit different "sub-models" to different partitions of the data (
metric_hub.segments
set in the FunnelForecast config) to the base class.DotMap was also removed because managing the conversion for dict to DotMap and back was beginning to be more trouble than it was worth as the config files got bigger and used in more complex ways.
Checklist for reviewer:
.circleci/config.yml
) will cause environment variables (particularly credentials) to be exposed in test logs