sky-uk / anticipy

A Python library for time series forecasting
BSD 3-Clause "New" or "Revised" License
82 stars 13 forks source link

Prediction intervals - block bootstrap #125

Closed capelastegui closed 5 years ago

capelastegui commented 5 years ago

Current implementation of prediction intervals uses simple bootstrapping by taking individual random samples. This can be improved by using block bootstrapping, as described here: https://otexts.com/fpp2/bootstrap.html

By default, 100 bootstrapped series are used, and the length of the blocks used for obtaining bootstrapped residuals is set to 24 for monthly data.

capelastegui commented 5 years ago

More references: https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp11-14.pdf

capelastegui commented 5 years ago

The following code implements residual block bootstrapping:

def get_block_size(s_date):
    s_date=pd.Series(s_date)
    min_date_delta = s_date.diff().min()

    # Default: not a date or unknown freq
    l_min=4
    l_max=1000

    if isinstance(min_date_delta,pd.Timedelta):
        # Yearly or quarterly data
        if min_date_delta >= pd.Timedelta(90,'d'):
            l_min=4
            l_max=8
        # Monthly data
        elif min_date_delta >= pd.Timedelta(28,'d'):
            l_min=12
            l_max=24
        # Weekly data
        elif min_date_delta >= pd.Timedelta(7,'d'):
            l_min=4
            l_max=104
        # Daily data
        elif min_date_delta >= pd.Timedelta(7,'d'):
            l_min=7
            l_max=712

    l_initial = np.ceil(s_date.index.size*0.2)
    l = max(min(l_initial, l_max), l_min)
    return int(l)

def get_residuals_sample_by_block(s_residuals, series_length, block_size):
    # Number of sampling blocks per experiment: add 2 blocks around edges
    n_blocks = int(np.ceil(series_length/block_size)+2)
    # Index of short version of residuals series 
    #- excludes block_size samples at the end
    index_short = s_residuals.index[0:s_residuals.index.size-block_size]
    # Random samples of index_short - use to sample segments of s_residuals
    a_index_sample = pd.Series(index_short).sample(
            n_blocks * n, replace=True).values
    # Map index_short to segments from residuals series
    s_values_map = pd.Series([s_residuals[i:i+block_size].values for i in index_short])
    # Sample residuals series by block-sized segments
    s_sample_unfiltered = s_values_map.loc[a_index_sample]
    a_sample_unfiltered = s_sample_unfiltered.values.reshape(n,n_blocks)
    # For each experiment, start taking values at rnd(0,block_size),
    # To avoid always matching block start
    a_starting_index = pd.Series(np.arange(0,block_size)).sample(n, replace=True).values

    a_sample = np.array([
        np.concatenate(a_sample_unfiltered[i])
        [starting_index:starting_index+series_length]
        for i in range(0,n)
    ])
    a_sample = np.cumsum(a_sample, axis=1)
    return a_sample
capelastegui commented 5 years ago

After testing prediction intervals with the new method, we found that it doesn't really achieve our main goal - narrower prediction intervals. In fact, it can make prediction intervals wider (though probably more accurate) in some scenarios. We won't be adding this feature for now.