Closed just-nilux closed 1 year ago
I sort of do it manually. If you look in DataframePopulator.py, you will find add_indicators() and add_hidden_indicators(). The indicators added by add_indicators() are the ones used to train the models, and must not be forward looking. The indicators added in add_hidden_indicators() can (and should) be forward looking because they are only used to generate the buy/sell labels used for training. If you look in the actual strategy file (e.g. NNTC_profit_Ensemble.py), you will also see a function called save_debug_indicators() - using this, you can 'save' indicators added by add_hidden_indicators() in a way that they will not be used in training or prediction. I use this to be able to plot those indicators for debug, and they will be prepended by % - for example future_profit_max would become %future_profit_max
The % and # conventions are for freakai, right? I haven't played with that yet - it didn't seem ready the last time I looked. I do plan to start trying that soon though.
Thanks. Yes I figured you use these functions to control this. I think thats where I will do the semi-automatic feature engineering then, like shifted candles and math operations, distances etc...
I'll report my findings how it goes and if it's worth the trouble ;)
Oh, and one more thing... I see you use keras.msle for anomaly detection and in most other strategies. I would propose to give Dissimilarity Index a try: https://www.freqtrade.io/en/stable/freqai-feature-engineering/#identifying-outliers-with-the-dissimilarity-index-di
It's great to filter bad predictions and much more...
Just an example how to calculate that... the mean calculation of course is too basic, that needs a some more engineering.
from sklearn.metrics.pairwise import euclidean_distances
def calculate_DI(train_data, pred_data, threshold=None):
# Calculate Euclidean distance between each prediction point and each training data point
distances = euclidean_distances(pred_data, train_data)
# Calculate the minimum distance for each prediction point
min_distances = np.min(distances, axis=1)
# Calculate the DI values based on the minimum distances
if threshold is None:
threshold = np.mean(min_distances)
DI = np.where(min_distances < threshold, 0, 1)
return DI
I'm wondering how to best approach feature engineering with your strategies. You probably know what a huge influence this has on the final predictions? Probably not with PCA compression but I think the other strategies would hugely benefit. Especially Anomaly!
I've tried to use tsfresh but it's just too heavy on computing. Another way would just be manually prepending features with %- oder #- but I'm not quite sure how to go from there:
have you tried anything in that direction? What place would be the right for such dataframe operation?