Open slavakurilyak opened 6 years ago
I am testing with the same conditions but using tsfresh:
{
"EXCHANGE": "bitfinex",
"ASSET": "btc_usd",
"DATA_FREQ": "daily",
"HISTORY_FREQ": "1d",
"CAPITAL_BASE": 5000,
"BASE_CURRENCY": "usd",
"START": "2015-04-01",
"END": "2018-06-07",
"BARS": 365,
"ORDER_SIZE": 0.5,
"SLIPPAGE_ALLOWED": 0.05
}
Accuracy: 0.46649484536082475
Coefficient Kappa: 0.14618404826813958
Classification Report:
precision recall f1-score support
KEEP 0.58 0.59 0.58 568
UP 0.36 0.39 0.38 344
DOWN 0.34 0.30 0.32 252
avg / total 0.46 0.47 0.47 1164
Confussion Matrix:
[[334 154 80]
[142 133 69]
[ 98 78 76]]
I think this result is better because you don't have to worry when you predict UP but get KEEP. You only have problems when you predict UP and get DOWN and vice versa. In this sense, this result is better.
I think we could define a new metric because the accuracy metric is not "useful".
Accuracy is a good starting point since it shows the fraction of predictions that a classification model got right. However, it is important to remember that accuracy alone doesn't tell the full story when we are working with a class-imbalanced data sets, where there is a significant imbalance between the number of positive and negative labels.
Here are a few ideas to consider for evaluation metrics:
Balanced Accuracy, which corrects for class frequency imbalances in data sets by calculating the accuracy on a per-class basis then averaging the per-class accuracies. (TPOT, 2016; Chamon et. al, 2017)
Classification Error with Confidence Intervals, which allows us to see the confidence intervals on the performance of our models. (Machine Learning Mastery, 2017)
The Average Price Change for Days When the Model Makes Correct Predictions Vs. The Average Price Change for Days When the Model Makes Incorrect Predictions , which results in a modified confusion matrix. (Lamon et. al, 2018)
Root Mean Squared Error (RMSE) (McNally, 2016; Guo et. al, 2018)
On the other hand, if we implement tpot library (see #25), we leverage Balanced Accuracy. Tpot measures "accuracy of the resulting pipelines or models as balanced accuracy" and selects pipelines "to simultaneously maximize classification accuracy on the data set while minimizing the number of operators in the pipeline." (TPOT, 2016).
Good news, technical analysis techniques improve our results:
{
"EXCHANGE": "bitfinex",
"ASSET": "btc_usd",
"DATA_FREQ": "daily",
"HISTORY_FREQ": "1d",
"CAPITAL_BASE": 5000,
"BASE_CURRENCY": "usd",
"START": "2015-04-01",
"END": "2018-06-07",
"BARS": 365,
"ORDER_SIZE": 0.5,
"SLIPPAGE_ALLOWED": 0.05
}
Accuracy: 0.5910652920962199
Coefficient Kappa: 0.34280346162605324
Classification Report:
precision recall f1-score support
KEEP 0.65 0.67 0.66 568
UP 0.55 0.55 0.55 344
DOWN 0.51 0.46 0.48 252
avg / total 0.59 0.59 0.59 1164
Confussion Matrix:
[[383 110 75]
[118 189 37]
[ 90 46 116]]
I'm going to check out the possibility of data leakage in our problem because of the good results!
@bukosabino great idea to drop Nan values (commit) to prevent data leakage!
We also need to check for model overfitting. Consider these solutions to prevent overfitting:
Inspiration: Elite Data Science, 2017
We also need to check for backtest overfitting (see #85).
Goal
As a developer, I want to improve the existing machine learning model using XGBoost, so that I can beat the benchmark (buy and hold strategy).
As a developer, I want to achieve higher than 50% accuracy using XGBoost, so that I can beat the benchmark (buy and hold strategy).
Inspiration
Running
$ strat -ml xgboost
gives:Results
Here is the
backtest_summary.csv
This model's accuracy is 48.37%, which is less than 50%. Our accuracy must be better than flipping a coin (50% accuracy of getting right or wrong).