Feature Engineering - Githubissues

just-nilux commented 1 year ago

I'm wondering how to best approach feature engineering with your strategies. You probably know what a huge influence this has on the final predictions? Probably not with PCA compression but I think the other strategies would hugely benefit. Especially Anomaly!

I've tried to use tsfresh but it's just too heavy on computing. Another way would just be manually prepending features with %- oder #- but I'm not quite sure how to go from there:

Create a new dataframe with just the %- columns, ohlc and date column
Removing other columns that are not protected or prepended with %-

have you tried anything in that direction? What place would be the right for such dataframe operation?

nateemma commented 1 year ago

I sort of do it manually. If you look in DataframePopulator.py, you will find add_indicators() and add_hidden_indicators(). The indicators added by add_indicators() are the ones used to train the models, and must not be forward looking. The indicators added in add_hidden_indicators() can (and should) be forward looking because they are only used to generate the buy/sell labels used for training. If you look in the actual strategy file (e.g. NNTC_profit_Ensemble.py), you will also see a function called save_debug_indicators() - using this, you can 'save' indicators added by add_hidden_indicators() in a way that they will not be used in training or prediction. I use this to be able to plot those indicators for debug, and they will be prepended by % - for example future_profit_max would become %future_profit_max

The % and # conventions are for freakai, right? I haven't played with that yet - it didn't seem ready the last time I looked. I do plan to start trying that soon though.

just-nilux commented 1 year ago

Thanks. Yes I figured you use these functions to control this. I think thats where I will do the semi-automatic feature engineering then, like shifted candles and math operations, distances etc...

I'll report my findings how it goes and if it's worth the trouble ;)

Oh, and one more thing... I see you use keras.msle for anomaly detection and in most other strategies. I would propose to give Dissimilarity Index a try: https://www.freqtrade.io/en/stable/freqai-feature-engineering/#identifying-outliers-with-the-dissimilarity-index-di

It's great to filter bad predictions and much more...

Just an example how to calculate that... the mean calculation of course is too basic, that needs a some more engineering.

from sklearn.metrics.pairwise import euclidean_distances

def calculate_DI(train_data, pred_data, threshold=None):
    # Calculate Euclidean distance between each prediction point and each training data point
    distances = euclidean_distances(pred_data, train_data)
    # Calculate the minimum distance for each prediction point
    min_distances = np.min(distances, axis=1)
    # Calculate the DI values based on the minimum distances
    if threshold is None:
        threshold = np.mean(min_distances)
    DI = np.where(min_distances < threshold, 0, 1)

    return DI

nateemma / strategies

Feature Engineering #19