EarlyStopping in model training

just-nilux commented 1 year ago

I just noticed that many model files are using EarlyStopping which causes my training to finish after 5-10mins... with a fiel size of 300kB.

What are doing different so you actually get well trained models of a proper size?

nateemma commented 1 year ago

How long is the time period for which you are training? I typically use 1-2 years of data - if you only use a month or so it will terminate early because there aren't as many different trends to fit as there would be with a longer timeframe. Additionally, it could just be that you don't have enough buy/sell events to train properly.

Which strategy are you trying?

Thanks,

Phil

just-nilux commented 1 year ago

The buys/sells seem sufficient, I have plenty cause I did not restrict the training too much. Yes I'm training on 6 month of data, that might be it. What timeframe your models are trained on? I've tried 1m to 5m but after 150 epochs or so the training stops with EarlyStopping.

Easy way of course would be to remove the EarlyStopping part but this may affect the final result I think.

This is NNTC Ensemble.

nateemma commented 1 year ago

OK, looks like you have added a lot of new indicators (up to 319). I use Principal Component Analysis to compress the features to a fixed size (64 in this case) so that I have a fixed size for the models (required by Tensorflow). This is likely too much compression in your case.

There are a few things you can try:

increase the compressed size to a larger number (128 or maybe higher). You do this by changing the value of ncols in function get_compressor() in NNTC.py. you will have to delete and re-generate the models though
You can try running same buy/sell algorithm using the PCA framework, and just see what value it chooses for compression. That framework does not use Tensorflow, so it chooses a compression value such that it can reconstruct with 99% accuracy. It will use a different value for each pair, so choose the largest value that you see

Which NNTC Ensemble version is it (e.g. NNTC_profit_Ensemble)? As an aside, I haven't had much success using the Ensemble models. The best performing strategy for me is NNTC_profit_Transformer, but that will take forever to train, so maybe try NNTC_profit_Wavenet? If you do want to use Ensemble, the best performance I have seen in testing is with NNTC_fbb_Ensemble

Hope that helps

Thanks,

Phil

just-nilux commented 1 year ago

Yes I will play with the n_cols. I actually wrote my own base class and simply use your classes as sub classes so I can easily change my code without messing around with yours :) Well I'm a big fan of Anomaly with Autoencoder as well as NNTC with Transformer, heavy on resources tho. The best results so far of all NNTC for me is Ensemble. I think it heavily depends on the features, compressor, part of dataframe for training (end/middle) and of course your training conditions....

One day when I have some more time I'll clean up my code from the private stuff and commit my code to git, but I'm just not much of a git workflow person...

nateemma commented 1 year ago

Yeah, git has been causing me a lot of headaches recently (it turned out to be a bad pull request that I didn't review closely enough).

FYI, I took a quick look at your legendary_ta repo, and I will probably try adding those indicators next week.

just-nilux commented 1 year ago

Yeah, git has been causing me a lot of headaches recently (it turned out to be a bad pull request that I didn't review closely enough).

FYI, I took a quick look at your legendary_ta repo, and I will probably try adding those indicators next week.

Sure, fisher stochastic center of gravity is a very useful feature for ML strategies. Also SMI is great to counter lookahead and confirm entries. I'm still working on the dynamic exhaustion bands, optimal solution would be to train a model on it and predict the future average length of peaks /valleys and consecutive ups and down. This way it would be a great way to predict reversals, entries and exits. DWT I didn't use for it because I feel the lookahead is just so intense it's hard to control it. I'm running another experiment that is a applying a rational quadratic kernel on the price to smoothen it just a little bit.

Well, lots of stuff, lots of ideas, too little time :(

just-nilux commented 1 year ago

OK, looks like you have added a lot of new indicators (up to 319). I use Principal Component Analysis to compress the features to a fixed size (64 in this case) so that I have a fixed size for the models (required by Tensorflow). This is likely too much compression in your case.

There are a few things you can try:

increase the compressed size to a larger number (128 or maybe higher). You do this by changing the value of ncols in function get_compressor() in NNTC.py. you will have to delete and re-generate the models though

You can try running same buy/sell algorithm using the PCA framework, and just see what value it chooses for compression. That framework does not use Tensorflow, so it chooses a compression value such that it can reconstruct with 99% accuracy. It will use a different value for each pair, so choose the largest value that you see

Which NNTC Ensemble version is it (e.g. NNTC_profit_Ensemble)? As an aside, I haven't had much success using the Ensemble models. The best performing strategy for me is NNTC_profit_Transformer, but that will take forever to train, so maybe try NNTC_profit_Wavenet? If you do want to use Ensemble, the best performance I have seen in testing is with NNTC_fbb_Ensemble

Hope that helps

Thanks,

Phil

Thanks for recommending to increase the compression size, this helps. I'm wondering if that is the case for Anomaly as well, I've increased the compression as well I'm running some training. What contamination is acceptable you think?

ETH/USDT:USDT
    training models...
    Using Autoencoder...
    No saved model found, building new model...
    model not found (/tmp/BuyCompressionAutoEncoder/checkpoint)...
Model: "BuyCompressionAutoEncoder"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 encoder (Sequential)        (None, 1, 128)            305280    

 decoder (Sequential)        (None, 1, 327)            305479    

=================================================================
Total params: 610,759
Trainable params: 610,759
Non-trainable params: 0
_________________________________________________________________
3162/3162 [==============================] - 4s 1ms/step
    Compressed data 327 -> 128 (features)
     dataframe: (101157, 128)  -> train: (80925, 128)  + test: (20231, 128)
     buys: (101157,)  -> train: (80925,)  + test: (20231,)
     sells: (101157,)  -> train: (80925,)  + test: (20231,)
    #training samples: 80925  #buys: 8775  #sells: 8709
    fitting classifier: AnomalyDetector_Ensemble contamination: 0.108

nateemma commented 1 year ago

Yes, they both use the same compression. With Anomaly and NNTC, you can also try turning off compression completely by setting the flag compress_data to False. You can set this in Anomaly.py or NNTC.py to affect all strats, or you can set it in the strat file to affect only that strategy. You will need to delete and retrain the models (if any) though

Thanks,

Phil

On Sun, Apr 30, 2023 at 6:26 AM nilux @.***> wrote:

OK, looks like you have added a lot of new indicators (up to 319). I use Principal Component Analysis to compress the features to a fixed size (64 in this case) so that I have a fixed size for the models (required by Tensorflow). This is likely too much compression in your case.

There are a few things you can try:

increase the compressed size to a larger number (128 or maybe higher). You do this by changing the value of ncols in function get_compressor() in NNTC.py. you will have to delete and re-generate the models though

You can try running same buy/sell algorithm using the PCA framework, and just see what value it chooses for compression. That framework does not use Tensorflow, so it chooses a compression value such that it can reconstruct with 99% accuracy. It will use a different value for each pair, so choose the largest value that you see

Which NNTC Ensemble version is it (e.g. NNTC_profit_Ensemble)? As an aside, I haven't had much success using the Ensemble models. The best performing strategy for me is NNTC_profit_Transformer, but that will take forever to train, so maybe try NNTC_profit_Wavenet? If you do want to use Ensemble, the best performance I have seen in testing is with NNTC_fbb_Ensemble

Hope that helps

Thanks,

Phil

Thanks for recommending to increase the compression size, this helps. I'm wondering if that is the case for Anomaly as well, I've increased the compression as well I'm running some training.

— Reply to this email directly, view it on GitHub https://github.com/nateemma/strategies/issues/18#issuecomment-1529025950, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD4X56BTPFRCX6AYMISLDTXDZSCHANCNFSM6AAAAAAXQIRHQI . You are receiving this because you commented.Message ID: @.***>

just-nilux commented 1 year ago

Yes I've tried that. Depending on the amount of features having compression makes sense, but I'll get back to engineering my features, reduce them or use different add_indicator() functions to add features depending on the algorithm or class. For anomaly I believe you need way less features...

nateemma commented 1 year ago

Here's a little trick you can try, that might help:

in PCA.py, set the flags dbg_analyse_pca and dbg_verbose to True
update the indicators to include the ones you want. You can do this in DataframePopulator.py, which is used by all of the strats (PCA, Anomaly, NNTC and NNPredict)
run any of the PCA_* strats
the debug output will identify any indicators that are not contributing at all. Also, it will give you some analysis of which are the most important identifiers. Make sure you look at several pairs though, as it does vary somewhat with different types of pairs

Hope that helps

Cheers,

Phil

On Thu, May 4, 2023 at 3:57 AM nilux @.***> wrote:

Yes I've tried that. Depending on the amount of features having compression makes sense, but I'll get back to the drawing board to reduce my features or use different add_indicator() functions to add features depending on the algorithm or class. For anomaly I believe you need way less features...

— Reply to this email directly, view it on GitHub https://github.com/nateemma/strategies/issues/18#issuecomment-1534554298, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD4X54RK42TPF23RUZAB63XEODSXANCNFSM6AAAAAAXQIRHQI . You are receiving this because you commented.Message ID: @.***>

just-nilux commented 1 year ago

Do you know of any way to test feature importance without PCA? From what I can see my models do much better without any compression... but the PCA feature importance doesn't tell you much if you don't use PCA.

Edit: I'm currently looking into Shap (https://shap.readthedocs.io/en/latest/index.html) to get more details about the keras models...

just-nilux commented 1 year ago

@nateemma one more burning q: I can't really get behind the concept of introducing willingly a lookahead in the training conditions. I'm using completely different conditions with other indicators and I'm not using forward looking shift but I would like to understand why you would choose to lookahead in the data in your strategies. Doesn't it render the strategies unusable?

How do you deal with the repainting signals? Maybe I missed something...

(future_df['future_profit_max'] >= future_df['profit_threshold']) & # future profit exceeds threshold
(future_df['future_max'] > future_df['dwt_recent_max']) # future window max exceeds prior window max

nateemma commented 1 year ago

No, because the lookahead data is only used to identify buy/sell events that are then used to train the detection algorithms. That data is not visible while making predictions - if it were visible, you would get incredible performance

Cheers,

Phil

On Sat, May 6, 2023 at 4:05 AM nilux @.***> wrote:

@nateemma https://github.com/nateemma one more burning q: I can't really get behind the concept of introducing willingly a lookahead in the training conditions. I'm using completely different conditions with other indicators and I'm not using forward looking shift but I would like to understand why you would choose to lookahead in the data in your strategies. Doesn't it render the strategies unusable? Maybe I missed something...

— Reply to this email directly, view it on GitHub https://github.com/nateemma/strategies/issues/18#issuecomment-1537117865, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABD4X5ZDR4GGDYR7VFHY6XLXEYV5ZANCNFSM6AAAAAAXQIRHQI . You are receiving this because you were mentioned.Message ID: @.***>

nateemma / strategies

EarlyStopping in model training #18