ssadjina / FatTailedTools

Various tools and helper functions for the analysis of fat-tailed data
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Create dedicated subsampling routine to be used for all subsampling where needed #5

Open ssadjina opened 1 year ago

ssadjina commented 1 year ago

The subsample routine in alpha.fit_alpha_and_scale_linear_subsampling() has been designed to include 2 kinds of general uncertainty in estimating parameters:

  1. Uncertainty and (to some extend) bias in the available data set (using bootstrapping).
  2. Uncertainty with respect to which origin or time shift to use when calculating log returns over several time unit periods, like 7 days based on 1-day data (randomly sampling uniformly from all possible time shifts).

Handle general kinds of uncertainty in a dedicated function allows us to reuse in all other functions in a consistent and good way.

ssadjina commented 1 year ago

The function would look something like this:

def subsample(data, func, n_subsamples, period_days, frac):

   # Set up storing results
   results = []

   for i in range(n_subsamples):

      # Randomly select a time shift/origin
      time_shift = np.random.choice(range(period_days))

      # Calculate the log returns over 'period' and using a shift 'time_shift'
      series = returns.get_log_returns(data, periods='{}d'.format(period_days), offset=time_shift).dropna()

      # Use bootstrapping to include the uncertainty wrt. to the data.
      subsample = series.sample(frac=frac, replace=True)

      # Perform desired calculation
      result = func(subsample)

      # Store results
      results.append(result)

   return results

In that case, a function 'func' is passed to execute and calculate a result. This could, for example, be a linear fit on the log-log survival function to estimate the tail exponent.

ssadjina commented 1 year ago

Because it is not clear how this is best done, here are a few alternatives:

ssadjina commented 1 year ago

Current draft:

def subsample(
        data,
        func,
        func_kws           = {},
        prep_func          = None,
        prep_func_kws      = {},
        n_subsamples       = 300,
        bootstrap_fraction = 0.9
):

    # Set up storing results
    results = {}

    # Subsample loop
    for i in range(n_subsamples):

        # Prepare the data before the subsampling
        if prep_func is not None:
            series = prep_func(data, **prep_func_kws)
        else:
            series = data

        # Use bootstrapping to include the uncertainty wrt. to the data.
        subsample = series.dropna().sample(frac=bootstrap_fraction, replace=True)

        # Perform desired calculation
        result = func(subsample, **func_kws)

        # Store results
        results.update({i: result})

    return results