During backtest FDM calculates as NaN for instruments with short data history

MarkSurfNZ commented 2 years ago

While backtesing 1,.47 I note that the FDM for ETHEREUM is NaN. The data start date for ETHEREUM is mid-Feb 2021. I believe the problem occurs in the diversification_multiplier_from_list() function in the diversifications_multipliers.py module at line 65. The div_mult_df is created by combining the ref_periods vector as the index with the div_mult_vector. The ref_periods seem to always contain dates that are Sundays. The div_mult_df is then resampled to business days and forward filled to give div_mult_df_daily. The problem is that in the case of ETHEREUM the div_mult_df has only 1 record (value is 1 and index is Sunday 14 Feb). When this is resampled to business days, the index changes to Fri 12 Feb which is prior to the Sunday and so the forward fill enters a NaN value for this day, which is the only record. This NaN value is eventually used as the FDM. Note, this is not such a problem when div_mult_df has more than 1 record (instruments with longer data history) because the forward fill enters the correct values in all subsequent records after the Friday as they are forward of the Sunday. I have a fix which seems to work. Although my concern is that maybe it is fixing the symptom rather than the underlying cause. The fix is to back fill when div_mult_df has only 1 record and forward fill the rest of the time:

# Change to business days, so moving average will make sense
#************************************************************************************
# MS update: first day is a Sunday, when resample to business days, this
# changes to previous Friday. When ffill() this gets a NaN value. If this is the only 
# record in the series bfill(), so NaN does not propagate forward.  
if div_mult_df.shape[0] == 1:
    div_mult_df_daily = div_mult_df.resample("1B").bfill()
else:
    div_mult_df_daily = div_mult_df.resample("1B").ffill()

robcarver17 commented 2 years ago

This is an edge case to beat all edge cases!

First of all, it's super helpful if you can create a minimum example. Here's mine:

from systems.provided.futures_chapter15.basesystem import futures_system
system = futures_system()
system.config.instrument_weights=dict(ETHEREUM=1)
system.config.use_forecast_div_mult_estimates = True
system.config.use_forecast_weight_estimates = True ## Change to False investigate both types

system.combForecast.get_forecast_diversification_multiplier("ETHEREUM")

Digging back, the reason for the Nan appearing is that the forecast weights are initially set to zero when aligned to forecasts. But the first thing I noticed is that the non estimated forecast weights are daily, wheras they ought to be monthly. I changed that for consistency, and also so that I wouldn't need two kinds of fix for this problem.

OK so now if I look at the two kinds of forecast weights produced I get this with fixed weights:

system.combForecast.get_raw_monthly_forecast_weights('ETHEREUM')

           carry  ewmac32_128  ewmac64_256  ewmac16_64
index                                                  
2021-02-28    0.5         0.08         0.21        0.21
2021-03-31    0.5         0.08         0.21        0.21
2021-04-30    0.5         0.08         0.21        0.21
...
2022-07-31    0.5         0.08         0.21        0.21
2022-08-31    0.5         0.08         0.21        0.21
2022-09-30    0.5         0.08         0.21        0.21

And this with estimated, since they are only estimated every year:

system.combForecast.get_raw_monthly_forecast_weights('ETHEREUM')

              carry  ewmac4_16  ewmac2_8  ...  ewmac64_256  ewmac32_128  ewmac16_64
2021-02-14  0.142857   0.142857  0.142857  ...     0.142857     0.142857    0.142857

However in both cases, if I look at the weights after alignment to forecasts I get zeros at the start:

system.combForecast.get_forecast_weights("ETHEREUM")

               carry  ewmac4_16  ewmac2_8  ...  ewmac64_256  ewmac32_128  ewmac16_64
index                                      ...                                      
2021-02-11  0.000000   0.000000  0.000000  ...     0.000000     0.000000    0.000000
2021-02-12  0.000000   0.000000  0.000000  ...     0.000000     0.000000    0.000000
...
2021-02-23  0.000000   0.000000  0.000000  ...     0.000000     0.000000    0.000000
2021-02-24  0.000000   0.000000  0.000000  ...     0.000000     0.000000    0.000000
2021-02-25  0.014051   0.014051  0.014051  ...     0.014051     0.014051    0.014051
2021-02-26  0.025754   0.025754  0.025754  ...     0.025754     0.025754    0.025754
2021-03-01  0.035652   0.035652  0.035652  ...     0.035652     0.035652    0.035652
2021-03-02  0.044131   0.044131  0.044131  ...     0.044131     0.044131    0.044131
...

Note the gradual increase in weights is caused by smoothing; something that makes sense when weights are reestimated every year but not right at the start. This is especially bonkers when it's fixed forecast weights. Something that doesn't maybe make much sense here is that the smoothing is done after the weights are added up to one by row, when perhaps the adding up by one should happen after the smoothing.

Here are the unsmoothed weights, for fixed weights as it happens:

system.combForecast.get_unsmoothed_forecast_weights("ETHEREUM").head(20)

            carry  ewmac32_128  ewmac64_256  ewmac16_64
index                                                  
2021-02-11    0.0         0.00         0.00        0.00
2021-02-12    0.0         0.00         0.00        0.00
...
2021-02-24    0.0         0.00         0.00        0.00
2021-02-25    NaN          NaN          NaN         NaN
2021-02-26    NaN          NaN          NaN         NaN
2021-03-01    0.5         0.08         0.21        0.21
2021-03-02    0.5         0.08         0.21        0.21
...

Where are the leading zeros coming from? Well basically when a forecast has a Nan, we set the forecast weight to zero. This helps when forecasts have different lengths of data for whatever reason:

self = system.combForecast ## recommended for debugging

monthly_forecast_weights = self.get_raw_monthly_forecast_weights(
            instrument_code
        )

## Stepping into self._fix_weights_to_forecasts

rule_variation_list = list(monthly_forecast_weights.columns)
forecasts.head(20)
forecasts = self.get_all_forecasts(instrument_code, rule_variation_list)

               carry  ewmac32_128  ewmac64_256  ewmac16_64
index                                                     
2021-02-11       NaN          NaN          NaN         NaN
2021-02-12       NaN          NaN          NaN         NaN
2021-02-15       NaN          NaN          NaN         NaN
...
2021-02-23       NaN          NaN          NaN         NaN
2021-02-24       NaN          NaN          NaN         NaN
2021-02-25 -5.533421    -0.240455    -0.080667   -0.735169
2021-02-26 -5.770558    -0.399279    -0.135109   -1.199794

Here's the key function: syscore.pdutils.fix_weights_vs_position_or_forecast This is also used for instrument weight alignment so we need to be careful.

Here is the critical piece of code:

    # forward fill forecasts/positions
    pdm_ffill = position_or_forecast.ffill()

    # remove weights if nan forecast or position
    adj_weights[np.isnan(pdm_ffill)] = 0.0

    # change rows so weights add to one
    normalised_weights = weights_sum_to_one(adj_weights)

There are big dangers in changing this behavour. For example, suppose we have fixed forecast weights over two trading rules, but only one of our forecasts is active for the first few months of data. If we don't set the initial Nans to zero weights, then we'd start with 0.5, 0.5 weights, and then when one forecast switched on we'd have 0.5, 0 after a smoothing period, and then we'd switch back to 0.5, 0.5. Given that the smoothing at the beginning messes things up anyway, maybe that isn't a big problem?

robcarver17 commented 2 years ago

Let's write down an alternative methodology and think about the ramifications:

We start with two forecasts (or instruments) with nan forecasts, but not nan weights. Then one forecast gets non nan values before the other.
We apply the rule if all Nan don't zero the weights, but if less than all of them are Nan then zero the Nan weights.
So now we have weights of .5, .5... which then become .5, 0 once one forecast kicks in... and then .5, .5
We now apply a smoothing, so we get weights of .5, .5, smoothing down to 0.5, 0, then smoothing back up to .5, .5
Finally we apply the sum to one rule. So now we get weights of .5, .5 (with no forecasts), smoothing to 1, 0; then smoothing back to .5, .5

The effect of this is that the forecast which kicks in earlier would have a lower weight initially than expected. But then under the current policy this is also true. So we don't have a significant change in behaviour. In fact things ought to be better.

The changes required then are:

Change syscore.pdutils.fix_weights_vs_position_or_forecast so we apply the rule if all Nan don't zero the weights
Remove the sum to one from syscore.pdutils.fix_weights_vs_position_or_forecast
Add a sum to one into get_forecast_weights()

robcarver17 commented 2 years ago

Let's set up another example to see how things currently work for both FDM and IDM plus weights (not using Ethereum as bug isn't fixed yet)

system = futures_system()
system.config.instrument_weights=dict(US10=.5, US5=.5)
system.config.use_forecast_div_mult_estimates = True
system.config.use_forecast_weight_estimates = False
system.config.use_instrument_div_mult_estimates = True

system.combForecast.get_forecast_weights('US10').plot()

Figure_2

robcarver17 commented 2 years ago

system.combForecast.get_forecast_diversification_multiplier_estimated('US10').plot()

Figure_1

robcarver17 commented 2 years ago

system.portfolio.get_instrument_weights().plot()

Figure_2

robcarver17 commented 2 years ago

system.portfolio.get_instrument_diversification_multiplier().plot()

Figure_1

robcarver17 commented 2 years ago

Now with code changes

system.combForecast.get_forecast_weights('US10').plot() Figure_1

robcarver17 commented 2 years ago

system.combForecast.get_forecast_diversification_multiplier_estimated('US10').plot() Figure_1

robcarver17 commented 2 years ago

system.portfolio.get_instrument_weights().plot()

Figure_1

robcarver17 commented 2 years ago

Note that for the US10 year we get the same kind of weight. The US 5 year has a phantom weight, but it won't make any difference in practice.

robcarver17 commented 2 years ago

system.portfolio.get_instrument_diversification_multiplier().plot() Figure_1

robcarver17 commented 2 years ago

Now let's return to our OG example

We still have a Nan value

Wtih estimated weights:

system.combForecast.get_forecast_weights("ETHEREUM")
            ewmac64_256  ewmac8_32  ...  ewmac16_64  ewmac32_128
index                               ...                         
2021-02-11          NaN        NaN  ...         NaN          NaN
2021-02-12          NaN        NaN  ...         NaN          NaN
2021-02-15     0.142857   0.142857  ...    0.142857     0.142857
2021-02-16     0.142857   0.142857  ...    0.142857     0.142857

With fixed weights

system.combForecast.get_forecast_weights("ETHEREUM").head(15)
            ewmac16_64  ewmac64_256  ewmac32_128  carry
index                                                  
2021-02-11         NaN          NaN          NaN    NaN
2021-02-12         NaN          NaN          NaN    NaN
2021-02-15         NaN          NaN          NaN    NaN
2021-02-16         NaN          NaN          NaN    NaN
2021-02-17         NaN          NaN          NaN    NaN
2021-02-18         NaN          NaN          NaN    NaN
2021-02-19         NaN          NaN          NaN    NaN
2021-02-22         NaN          NaN          NaN    NaN
2021-02-23         NaN          NaN          NaN    NaN
2021-02-24         NaN          NaN          NaN    NaN
2021-02-25         NaN          NaN          NaN    NaN
2021-02-26         NaN          NaN          NaN    NaN
2021-03-01        0.21         0.21         0.08    0.5
2021-03-02        0.21         0.21         0.08    0.5

robcarver17 commented 2 years ago

The behaviour of the system is improved, but there is still more work to do.

robcarver17 commented 2 years ago

In the end I went for quite a simple solution, although all the tweaking I've done to forecasf weights is good:

    # Change to business days, so moving average will make sense, aligned to original weights
    div_mult_df_daily = div_mult_df.reindex(weight_df_aligned.index, method="ffill")

    ## Leading Nans, just use 1.0
    div_mult_df_daily[div_mult_df_daily.isna()] = 1.0

    # take a moving average to smooth the jumps
    div_mult_df_smoothed = div_mult_df_daily.ewm(span=ewma_span).mean()

MarkSurfNZ commented 2 years ago

Elegant solution, and interesting foray into forecast weight and FDM calculation. Thank you.

robcarver17 / pysystemtrade

During backtest FDM calculates as NaN for instruments with short data history #804