produvia / kryptos

Kryptos AI is a virtual investment assistant that manages your cryptocurrency portfolio
http://twitter.com/kryptos_ai
MIT License
48 stars 8 forks source link

Improve Machine Learning Model to Beat Benchmark #79

Open slavakurilyak opened 6 years ago

slavakurilyak commented 6 years ago

Goal

As a developer, I want to improve the existing machine learning model using XGBoost, so that I can beat the benchmark (buy and hold strategy).

As a developer, I want to achieve higher than 50% accuracy using XGBoost, so that I can beat the benchmark (buy and hold strategy).

Inspiration

Running $ strat -ml xgboost gives:

{
   "EXCHANGE": "bitfinex",
   "ASSET": "btc_usd",
   "DATA_FREQ": "daily",
   "HISTORY_FREQ": "1d",
   "CAPITAL_BASE": 5000,
   "BASE_CURRENCY": "usd",
   "START": "2015-04-01",
   "END": "2018-06-07",
   "BARS": 365,
   "ORDER_SIZE": 0.5,
   "SLIPPAGE_ALLOWED": 0.05
}

Results

screen shot 2018-06-09 at 12 39 14 pm
# xgboost_confussion_matrix.txt
Accuracy: 0.4836769759450172
Coefficient Kappa: 0.174298542554849
Classification Report:
             precision    recall  f1-score   support

       KEEP       0.62      0.60      0.61       568
         UP       0.39      0.50      0.44       344
       DOWN       0.29      0.20      0.24       252

avg / total       0.48      0.48      0.48      1164

Confussion Matrix:
[[340 158  70]
 [120 173  51]
 [ 89 113  50]]

Here is the backtest_summary.csv

# backtest_summary.csv
start_date: 2015-04-01
end_date: 2018-06-07
backtest_minutes: 0.0
backtest_days: 1163.0
backtest_weeks: 166.1428571429
number_of_trades: 79
average_trades_per_week_avg: 0.475494411
average_trade_amount_usd: 25.5825882274
initial_capital: 5000.0
ending_capital: 12444.4448932896
net_profit: 7444.4448932896
net_profit_pct: 148.8888978658
average_daily_profit: 6.4010704156
average_daily_profit_pct: 0.1280214083
average_exposure: 858.5830145073
average_exposure_pct: 10.349086356
net_risk_adjusted_return_pct: 8.6706174796
max_drawdown_pct_catalyst: -43.3888107008
max_daily_drawdown_pct: -11.1798836831
max_weekly_drawdown_pct: -17.2725493691
sharpe_ratio_avg: 0.6942761152
std_rolling_10_day_pct_avg: 0.0094776328
std_rolling_100_day_pct_avg: 0.0376401827
number_of_simulations: 1164

This model's accuracy is 48.37%, which is less than 50%. Our accuracy must be better than flipping a coin (50% accuracy of getting right or wrong).

bukosabino commented 6 years ago

I am testing with the same conditions but using tsfresh:

{
   "EXCHANGE": "bitfinex",
   "ASSET": "btc_usd",
   "DATA_FREQ": "daily",
   "HISTORY_FREQ": "1d",
   "CAPITAL_BASE": 5000,
   "BASE_CURRENCY": "usd",
   "START": "2015-04-01",
   "END": "2018-06-07",
   "BARS": 365,
   "ORDER_SIZE": 0.5,
   "SLIPPAGE_ALLOWED": 0.05
}
Accuracy: 0.46649484536082475
Coefficient Kappa: 0.14618404826813958
Classification Report:
             precision    recall  f1-score   support

       KEEP       0.58      0.59      0.58       568
         UP       0.36      0.39      0.38       344
       DOWN       0.34      0.30      0.32       252

avg / total       0.46      0.47      0.47      1164

Confussion Matrix:
[[334 154  80]
 [142 133  69]
 [ 98  78  76]]

I think this result is better because you don't have to worry when you predict UP but get KEEP. You only have problems when you predict UP and get DOWN and vice versa. In this sense, this result is better.

I think we could define a new metric because the accuracy metric is not "useful".

slavakurilyak commented 6 years ago

Accuracy is a good starting point since it shows the fraction of predictions that a classification model got right. However, it is important to remember that accuracy alone doesn't tell the full story when we are working with a class-imbalanced data sets, where there is a significant imbalance between the number of positive and negative labels.

Here are a few ideas to consider for evaluation metrics:

  1. Balanced Accuracy, which corrects for class frequency imbalances in data sets by calculating the accuracy on a per-class basis then averaging the per-class accuracies. (TPOT, 2016; Chamon et. al, 2017)

  2. Classification Error with Confidence Intervals, which allows us to see the confidence intervals on the performance of our models. (Machine Learning Mastery, 2017)

  3. The Average Price Change for Days When the Model Makes Correct Predictions Vs. The Average Price Change for Days When the Model Makes Incorrect Predictions , which results in a modified confusion matrix. (Lamon et. al, 2018)

  4. Root Mean Squared Error (RMSE) (McNally, 2016; Guo et. al, 2018)

On the other hand, if we implement tpot library (see #25), we leverage Balanced Accuracy. Tpot measures "accuracy of the resulting pipelines or models as balanced accuracy" and selects pipelines "to simultaneously maximize classification accuracy on the data set while minimizing the number of operators in the pipeline." (TPOT, 2016).

bukosabino commented 6 years ago

Good news, technical analysis techniques improve our results:

{
   "EXCHANGE": "bitfinex",
   "ASSET": "btc_usd",
   "DATA_FREQ": "daily",
   "HISTORY_FREQ": "1d",
   "CAPITAL_BASE": 5000,
   "BASE_CURRENCY": "usd",
   "START": "2015-04-01",
   "END": "2018-06-07",
   "BARS": 365,
   "ORDER_SIZE": 0.5,
   "SLIPPAGE_ALLOWED": 0.05
}
Accuracy: 0.5910652920962199
Coefficient Kappa: 0.34280346162605324
Classification Report:
             precision    recall  f1-score   support

       KEEP       0.65      0.67      0.66       568
         UP       0.55      0.55      0.55       344
       DOWN       0.51      0.46      0.48       252

avg / total       0.59      0.59      0.59      1164

Confussion Matrix:
[[383 110  75]
 [118 189  37]
 [ 90  46 116]]

I'm going to check out the possibility of data leakage in our problem because of the good results!

slavakurilyak commented 6 years ago

@bukosabino great idea to drop Nan values (commit) to prevent data leakage!

We also need to check for model overfitting. Consider these solutions to prevent overfitting:

  1. Cross-validation
  2. Training with more data
  3. Removing features
  4. Stopping early
  5. Regularization
  6. Ensembling

Inspiration: Elite Data Science, 2017

We also need to check for backtest overfitting (see #85).