Open ssadjina opened 1 year ago
The function would look something like this:
def subsample(data, func, n_subsamples, period_days, frac):
# Set up storing results
results = []
for i in range(n_subsamples):
# Randomly select a time shift/origin
time_shift = np.random.choice(range(period_days))
# Calculate the log returns over 'period' and using a shift 'time_shift'
series = returns.get_log_returns(data, periods='{}d'.format(period_days), offset=time_shift).dropna()
# Use bootstrapping to include the uncertainty wrt. to the data.
subsample = series.sample(frac=frac, replace=True)
# Perform desired calculation
result = func(subsample)
# Store results
results.append(result)
return results
In that case, a function 'func' is passed to execute and calculate a result. This could, for example, be a linear fit on the log-log survival function to estimate the tail exponent.
Because it is not clear how this is best done, here are a few alternatives:
for
loop is outside the function. The advantage would be that we don't need to drag in the function func
into the subsampling function. The downside is that the main loop is outside, so we may give up some control on the subsampling itself. The previous draft may also be better structured and more modular (because we define stand-alone functions func
that perform some calculation on a series independent of any sampling and that then can easily be passed into the subsampling routine).func
can then be applied to it.Current draft:
def subsample(
data,
func,
func_kws = {},
prep_func = None,
prep_func_kws = {},
n_subsamples = 300,
bootstrap_fraction = 0.9
):
# Set up storing results
results = {}
# Subsample loop
for i in range(n_subsamples):
# Prepare the data before the subsampling
if prep_func is not None:
series = prep_func(data, **prep_func_kws)
else:
series = data
# Use bootstrapping to include the uncertainty wrt. to the data.
subsample = series.dropna().sample(frac=bootstrap_fraction, replace=True)
# Perform desired calculation
result = func(subsample, **func_kws)
# Store results
results.update({i: result})
return results
The subsample routine in
alpha.fit_alpha_and_scale_linear_subsampling()
has been designed to include 2 kinds of general uncertainty in estimating parameters:Handle general kinds of uncertainty in a dedicated function allows us to reuse in all other functions in a consistent and good way.