produvia / kryptos

Kryptos AI is a virtual investment assistant that manages your cryptocurrency portfolio
http://twitter.com/kryptos_ai
MIT License
48 stars 8 forks source link

Feature Engineering #80

Open slavakurilyak opened 6 years ago

slavakurilyak commented 6 years ago

Goal

As a developer, I want to add features to the existing machine learning model (XGBoost), so that I can develop a more accurate machine learning model.

Consider

  1. Consider using Cryptocurrency volume data as features, already integrated in Kryptos (see volume() method), as features

  2. Consider adding external data sources, already integrated in Kryptos (see task #8), as features:

  1. Consider using blue-yonder's tsfresh for automatic extraction of more than 184+ time series features, such as:

Inspiration

What generally improves a model's score more on average, feature engineering or hyperparameter tuning? Feature engineering, without a doubt.

bukosabino commented 6 years ago

I have included tsfresh in the platform

https://github.com/produvia/cryptocurrency-trading-platform/commit/91895ad5c9e2eae3b55c04e954851a17c6da4ecd

bukosabino commented 6 years ago

We could add technical analysis features too (ta-lib). Sounds good to you?

slavakurilyak commented 6 years ago

Yes! Let's use technical analysis (ta-lib) as features (see #64) for machine learning.

bukosabino commented 6 years ago

I'm working on adding technical analysis features.

This article could be useful for us in order to add more features. Let me know if you agree to will work on this.

slavakurilyak commented 6 years ago

I'm working on adding technical analysis features.

I am looking forward to it

This article could be useful for us in order to add more features.

Thanks for sharing this practical article on the enigma data marketplace

Let me know if you agree to will work on this.

Let's implement Kryptos existing datasets as features. We already support Google Trends and Quadl data sources (see #8).

slavakurilyak commented 6 years ago

Let's add non-pricing datasets as features. We can use cryptocurrency volume data, Blockchain Info, and Google Search Volume.

bukosabino commented 6 years ago

I have added some external data sources (Google Search Volume and Blockchain Info) as features for Machine Learning models.

However, I don't completely understand you with cryptocurrency volume data:

We are already using the volume as a feature: https://github.com/produvia/cryptocurrency-trading-platform/blob/49951f284edbc13c77689d5a69ab67a30b59353e/kryptos/platform/strategy/strategy.py#L227

Edit: At this moment Google Search Volume is fine, but Quant dataset is unstable. So, we can use:

$ strat -d google -c "bitcoin" -c "btc" -ml xgboost
or
$ strat -ml xgboost -d google -c "bitcoin" -c "btc"
slavakurilyak commented 6 years ago

I have added some external data sources (Google Search Volume and Blockchain Info) as features for Machine Learning models.

Excellent! Since we now have multiple machine learning models, let's compare the differences between them in terms of accuracy.

We are already using the volume as a feature.

Perfect!

At this moment Google Search Volume is fine, but Quant dataset is unstable.

Can you clarify what you mean by Quandl dataset being unstable?

bukosabino commented 5 years ago

There were some bugs merging Quandl dataset on the system. Now it is fine. Some examples:

strat -ml xgboost -d google -c "bitcoin" -c "btc" 
strat -ml xgboost -d quandl -c 'MKTCP' -c 'NTRAN'
slavakurilyak commented 5 years ago

Excellent work! Now we can combine all of our existing datasets, including:

  1. google dataset (see manager.py#L243) (use google search terms: "btc usd" associated with the btc/usd cryptoasset),
  2. quandl datasets (see manager.py#L398) (there are currently 32 datasets)
  3. pricing & volume datasets (see manager.py#L57).