tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.25k stars 1.1k forks source link

Generate different forecast result at different run #770

Closed yug95 closed 4 years ago

yug95 commented 4 years ago

While running tfp sts model, i noticed each time i ran model it generated different forecast which is kind off problematic for inference.

import tensorflow_probability as tfp
trend = tfp.sts.LocalLinearTrend(observed_time_series=co2_by_month)
seasonal = tfp.sts.Seasonal(
    num_seasons=12, observed_time_series=co2_by_month)
model = tfp.sts.Sum([trend, seasonal], observed_time_series=co2_by_month)

co2_model = build_model(co2_by_month_training_data)
variational_posteriors = tfp.sts.build_factored_surrogate_posterior(
    model=co2_model)

num_variational_steps = 200 
num_variational_steps = int(num_variational_steps)

optimizer = tf.optimizers.Adam(learning_rate=.1)

q_samples_co2_ = variational_posteriors.sample(50)

co2_forecast_dist = tfp.sts.forecast(
    co2_model,
    observed_time_series=co2_by_month_training_data,
    parameter_samples=q_samples_co2_,
    num_steps_forecast=num_forecast_steps)

Here i have taken mean value. How to give Seed like we give in Random Forest algorithm or is there any other way to fix ?

davmre commented 4 years ago

There are two sources of randomness here: first in the initialization of the variational optimization, and then in sampling parameters from the optimized variational posterior. You can pass a seed to both of those operations:

variational_posteriors = tfp.sts.build_factored_surrogate_posterior( model=co2_model, seed=seed)

...

q_samplesco2 = variational_posteriors.sample(50, seed=seed)

To get deterministic results in eager mode (where each op is implicitly in its own 'graph' context with a random graph-level seed) you likely also need to call tf.random.set_seed(seed) at the top of your code.

On Wed, Feb 5, 2020 at 4:58 AM yogesh agrawal notifications@github.com wrote:

While running tfp sts model, i noticed each time i ran model it generated different forecast which is kind off problematic for inference.

import tensorflow_probability as tfp trend = tfp.sts.LocalLinearTrend(observed_time_series=co2_by_month) seasonal = tfp.sts.Seasonal( num_seasons=12, observed_time_series=co2_by_month) model = tfp.sts.Sum([trend, seasonal], observed_time_series=co2_by_month)

co2_model = build_model(co2_by_month_training_data) variational_posteriors = tfp.sts.build_factored_surrogate_posterior( model=co2_model)

num_variational_steps = 200 num_variational_steps = int(num_variational_steps)

optimizer = tf.optimizers.Adam(learning_rate=.1)

q_samplesco2 = variational_posteriors.sample(50)

co2_forecast_dist = tfp.sts.forecast( co2_model, observed_time_series=co2_by_month_training_data, parameter_samples=q_samplesco2, num_steps_forecast=num_forecast_steps)

Here i have taken mean value. How to give Seed like we give in Random Forest algorithm or is there any other way to fix ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/770?email_source=notifications&email_token=AAHSFCRQVFXRGF3WBCNRNODRBKZWDA5CNFSM4KQKUGXKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ILGLDAQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHSFCTJIJAANVDC3PCCGWDRBKZWDANCNFSM4KQKUGXA .

yug95 commented 4 years ago

Thanks @davmre :) But somehow it didn't solve my problem . My work -

  1. I have lets say 10 brands ( A,B,C,D....) and each brand have to forecast separately.
  2. Now lets take any brand 'A', in this i have to forecast for successive month with training upto last month like train upto 4th month and forecast for 5th month, train upto 5th month and forecast forecast for 6th month.....so on.. So even though i am giving seed at month level it is not giving same result.
df.Date = pd.to_datetime(df.Date,format='%d-%m-%Y')
#Brand_list = df.Key.unique()
Brand_list = ['A','B','C']
final = pd.DataFrame(columns=["Brand","Forecast_values","Actual_values"])
#iterate for all the brands...
for brand_name in Brand_list:
    print(brand_name)
    brand_df = df.loc[df.Key == brand_name]
    brand_df.set_index('Date',inplace=True)
    tmp = []
    brand_df = brand_df[:'2019-12-01']
    forecast = pd.DataFrame()
    Actuals = pd.DataFrame()
    #run for le cycle 3+9,4+8 etc....
    seed_value = 123
    if len(brand_df)>12:
        train_start = datetime.date(2019, 3, 1)
        train_till = datetime.date(2019, 12, 1)
        Actuals_end = datetime.date(2019, 12, 1)
        train_date = train_start
        while train_date < train_till:
            test_date = train_date + relativedelta.relativedelta(months=1)
            dependent_colume = 'Volume'
            x = brand_df.drop(columns=[dependent_colume,'Key'])
            y = brand_df[[dependent_colume]]
            train_x = x[:train_date]
            train_y = y[:train_date][[dependent_colume]]
            test_x = x[test_date:]
            test_y = y[test_date:][[dependent_colume]]
            train_date = train_date + relativedelta.relativedelta(months=1)
            try:
                my_list = []
                t = dict()
                # extrating linear trend parameter from series 
                trend = tfp.sts.LocalLinearTrend(observed_time_series=np.array(train_y))
                # extrating seasonal parameter from series 
                seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=np.array(train_y))

                my_list.append(trend)
                my_list.append(seasonal)

                for i in train_x.columns:
                    t[str(i)] = tfp.sts.LinearRegression(design_matrix=tf.concat([tf.reshape(np.array(train_x[i].astype(float)),(-1,1)),
                                                                        tf.reshape(np.array(test_x[i].astype(float)),(-1,1))],axis=-2),name=i)
                    my_list.append(t[str(i)])

                #creating strcutural model = ts + trend + seasonal
                model = tfp.sts.Sum(my_list, observed_time_series=np.array(train_y))
        #         #build surrogate model for getting optimal prior value...
                variational_posteriors = tfp.sts.build_factored_surrogate_posterior(model=model,seed=seed_value)
                num_variational_steps = 200
                num_variational_steps = int(num_variational_steps)
                optimizer = tf.optimizers.Adam(learning_rate=.1)

        #         #generating different posterior samples...
                q_samples_co2_ = variational_posteriors.sample(50,seed=seed_value)

                seed_value = seed_value + 1
                print(seed_value)

                co2_forecast_dist = tfp.sts.forecast(model,observed_time_series=np.array(train_y),parameter_samples=q_samples_co2_,num_steps_forecast=1)

                forecast[str(brand_name)+str('_')+str(test_date.month)+str("_")+str(test_date.year)] = np.array(co2_forecast_dist.mean())[0]
                Actuals[str(brand_name)+str('_')+str(test_date.month)+str("_")+str(test_date.year)] = test_y[:test_date].values[0]
            except:
                print("exception")
                continue

        if (len(forecast)>0 & len(Actuals>0)):
            forecast=forecast.T.reset_index()
            forecast.columns=["Brand","Forecast_values"]
            Actuals=Actuals.T.reset_index()
            Actuals.columns=["Brand","Actual_values"]
            brand_wise_merge = forecast.merge(Actuals,on="Brand",how="left")
            final = final.append(brand_wise_merge,ignore_index=True)
        else:
            print("doesn't match with TFP")
    else:
        print("length mismatch")

You can refer this sample file for data. Brand_timeseries.xlsx

yug95 commented 4 years ago

@davmre bychance you got a time to look into this ? please let me know

davmre commented 4 years ago

Did you try setting the graph-level seed (tf.random.set_seed(seed_value)) right before you create each STS model?

If you can reproduce the issue in a minimal example with a single time series, and get it running in a Colab notebook, I can try to dig further.

yug95 commented 4 years ago

Thanks @davmre its working now.