Closed MarkSurfNZ closed 2 years ago
This is an edge case to beat all edge cases!
First of all, it's super helpful if you can create a minimum example. Here's mine:
from systems.provided.futures_chapter15.basesystem import futures_system
system = futures_system()
system.config.instrument_weights=dict(ETHEREUM=1)
system.config.use_forecast_div_mult_estimates = True
system.config.use_forecast_weight_estimates = True ## Change to False investigate both types
system.combForecast.get_forecast_diversification_multiplier("ETHEREUM")
Digging back, the reason for the Nan appearing is that the forecast weights are initially set to zero when aligned to forecasts. But the first thing I noticed is that the non estimated forecast weights are daily, wheras they ought to be monthly. I changed that for consistency, and also so that I wouldn't need two kinds of fix for this problem.
OK so now if I look at the two kinds of forecast weights produced I get this with fixed weights:
system.combForecast.get_raw_monthly_forecast_weights('ETHEREUM')
carry ewmac32_128 ewmac64_256 ewmac16_64
index
2021-02-28 0.5 0.08 0.21 0.21
2021-03-31 0.5 0.08 0.21 0.21
2021-04-30 0.5 0.08 0.21 0.21
...
2022-07-31 0.5 0.08 0.21 0.21
2022-08-31 0.5 0.08 0.21 0.21
2022-09-30 0.5 0.08 0.21 0.21
And this with estimated, since they are only estimated every year:
system.combForecast.get_raw_monthly_forecast_weights('ETHEREUM')
carry ewmac4_16 ewmac2_8 ... ewmac64_256 ewmac32_128 ewmac16_64
2021-02-14 0.142857 0.142857 0.142857 ... 0.142857 0.142857 0.142857
However in both cases, if I look at the weights after alignment to forecasts I get zeros at the start:
system.combForecast.get_forecast_weights("ETHEREUM")
carry ewmac4_16 ewmac2_8 ... ewmac64_256 ewmac32_128 ewmac16_64
index ...
2021-02-11 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
2021-02-12 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
...
2021-02-23 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
2021-02-24 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
2021-02-25 0.014051 0.014051 0.014051 ... 0.014051 0.014051 0.014051
2021-02-26 0.025754 0.025754 0.025754 ... 0.025754 0.025754 0.025754
2021-03-01 0.035652 0.035652 0.035652 ... 0.035652 0.035652 0.035652
2021-03-02 0.044131 0.044131 0.044131 ... 0.044131 0.044131 0.044131
...
Note the gradual increase in weights is caused by smoothing; something that makes sense when weights are reestimated every year but not right at the start. This is especially bonkers when it's fixed forecast weights. Something that doesn't maybe make much sense here is that the smoothing is done after the weights are added up to one by row, when perhaps the adding up by one should happen after the smoothing.
Here are the unsmoothed weights, for fixed weights as it happens:
system.combForecast.get_unsmoothed_forecast_weights("ETHEREUM").head(20)
carry ewmac32_128 ewmac64_256 ewmac16_64
index
2021-02-11 0.0 0.00 0.00 0.00
2021-02-12 0.0 0.00 0.00 0.00
...
2021-02-24 0.0 0.00 0.00 0.00
2021-02-25 NaN NaN NaN NaN
2021-02-26 NaN NaN NaN NaN
2021-03-01 0.5 0.08 0.21 0.21
2021-03-02 0.5 0.08 0.21 0.21
...
Where are the leading zeros coming from? Well basically when a forecast has a Nan, we set the forecast weight to zero. This helps when forecasts have different lengths of data for whatever reason:
self = system.combForecast ## recommended for debugging
monthly_forecast_weights = self.get_raw_monthly_forecast_weights(
instrument_code
)
## Stepping into self._fix_weights_to_forecasts
rule_variation_list = list(monthly_forecast_weights.columns)
forecasts.head(20)
forecasts = self.get_all_forecasts(instrument_code, rule_variation_list)
carry ewmac32_128 ewmac64_256 ewmac16_64
index
2021-02-11 NaN NaN NaN NaN
2021-02-12 NaN NaN NaN NaN
2021-02-15 NaN NaN NaN NaN
...
2021-02-23 NaN NaN NaN NaN
2021-02-24 NaN NaN NaN NaN
2021-02-25 -5.533421 -0.240455 -0.080667 -0.735169
2021-02-26 -5.770558 -0.399279 -0.135109 -1.199794
Here's the key function: syscore.pdutils.fix_weights_vs_position_or_forecast This is also used for instrument weight alignment so we need to be careful.
Here is the critical piece of code:
# forward fill forecasts/positions
pdm_ffill = position_or_forecast.ffill()
# remove weights if nan forecast or position
adj_weights[np.isnan(pdm_ffill)] = 0.0
# change rows so weights add to one
normalised_weights = weights_sum_to_one(adj_weights)
There are big dangers in changing this behavour. For example, suppose we have fixed forecast weights over two trading rules, but only one of our forecasts is active for the first few months of data. If we don't set the initial Nans to zero weights, then we'd start with 0.5, 0.5 weights, and then when one forecast switched on we'd have 0.5, 0 after a smoothing period, and then we'd switch back to 0.5, 0.5. Given that the smoothing at the beginning messes things up anyway, maybe that isn't a big problem?
Let's write down an alternative methodology and think about the ramifications:
The effect of this is that the forecast which kicks in earlier would have a lower weight initially than expected. But then under the current policy this is also true. So we don't have a significant change in behaviour. In fact things ought to be better.
The changes required then are:
syscore.pdutils.fix_weights_vs_position_or_forecast
so we apply the rule if all Nan don't zero the weightssyscore.pdutils.fix_weights_vs_position_or_forecast
Let's set up another example to see how things currently work for both FDM and IDM plus weights (not using Ethereum as bug isn't fixed yet)
system = futures_system()
system.config.instrument_weights=dict(US10=.5, US5=.5)
system.config.use_forecast_div_mult_estimates = True
system.config.use_forecast_weight_estimates = False
system.config.use_instrument_div_mult_estimates = True
system.combForecast.get_forecast_weights('US10').plot()
system.combForecast.get_forecast_diversification_multiplier_estimated('US10').plot()
system.portfolio.get_instrument_weights().plot()
system.portfolio.get_instrument_diversification_multiplier().plot()
Now with code changes
system.combForecast.get_forecast_weights('US10').plot()
system.combForecast.get_forecast_diversification_multiplier_estimated('US10').plot()
system.portfolio.get_instrument_weights().plot()
Note that for the US10 year we get the same kind of weight. The US 5 year has a phantom weight, but it won't make any difference in practice.
system.portfolio.get_instrument_diversification_multiplier().plot()
Now let's return to our OG example
We still have a Nan value
Wtih estimated weights:
system.combForecast.get_forecast_weights("ETHEREUM")
ewmac64_256 ewmac8_32 ... ewmac16_64 ewmac32_128
index ...
2021-02-11 NaN NaN ... NaN NaN
2021-02-12 NaN NaN ... NaN NaN
2021-02-15 0.142857 0.142857 ... 0.142857 0.142857
2021-02-16 0.142857 0.142857 ... 0.142857 0.142857
With fixed weights
system.combForecast.get_forecast_weights("ETHEREUM").head(15)
ewmac16_64 ewmac64_256 ewmac32_128 carry
index
2021-02-11 NaN NaN NaN NaN
2021-02-12 NaN NaN NaN NaN
2021-02-15 NaN NaN NaN NaN
2021-02-16 NaN NaN NaN NaN
2021-02-17 NaN NaN NaN NaN
2021-02-18 NaN NaN NaN NaN
2021-02-19 NaN NaN NaN NaN
2021-02-22 NaN NaN NaN NaN
2021-02-23 NaN NaN NaN NaN
2021-02-24 NaN NaN NaN NaN
2021-02-25 NaN NaN NaN NaN
2021-02-26 NaN NaN NaN NaN
2021-03-01 0.21 0.21 0.08 0.5
2021-03-02 0.21 0.21 0.08 0.5
The behaviour of the system is improved, but there is still more work to do.
In the end I went for quite a simple solution, although all the tweaking I've done to forecasf weights is good:
# Change to business days, so moving average will make sense, aligned to original weights
div_mult_df_daily = div_mult_df.reindex(weight_df_aligned.index, method="ffill")
## Leading Nans, just use 1.0
div_mult_df_daily[div_mult_df_daily.isna()] = 1.0
# take a moving average to smooth the jumps
div_mult_df_smoothed = div_mult_df_daily.ewm(span=ewma_span).mean()
Elegant solution, and interesting foray into forecast weight and FDM calculation. Thank you.
While backtesing 1,.47 I note that the FDM for ETHEREUM is NaN. The data start date for ETHEREUM is mid-Feb 2021. I believe the problem occurs in the diversification_multiplier_from_list() function in the diversifications_multipliers.py module at line 65. The div_mult_df is created by combining the ref_periods vector as the index with the div_mult_vector. The ref_periods seem to always contain dates that are Sundays. The div_mult_df is then resampled to business days and forward filled to give div_mult_df_daily. The problem is that in the case of ETHEREUM the div_mult_df has only 1 record (value is 1 and index is Sunday 14 Feb). When this is resampled to business days, the index changes to Fri 12 Feb which is prior to the Sunday and so the forward fill enters a NaN value for this day, which is the only record. This NaN value is eventually used as the FDM. Note, this is not such a problem when div_mult_df has more than 1 record (instruments with longer data history) because the forward fill enters the correct values in all subsequent records after the Friday as they are forward of the Sunday. I have a fix which seems to work. Although my concern is that maybe it is fixing the symptom rather than the underlying cause. The fix is to back fill when div_mult_df has only 1 record and forward fill the rest of the time: