jimfleming / numerai

Code from my experiments on Numerai
https://medium.com/@jimfleming/notes-on-the-numerai-ml-competition-14e3d42c19f3#.p3swptim4
MIT License
286 stars 84 forks source link

Issue with data prep #2

Open AIAdventures opened 7 years ago

AIAdventures commented 7 years ago

Hi Jim! Great project! I am just having trouble with the prep data moudule. Running it on linux mint.

andrewcz@andrewcz-PORTEGE-Z30t-B ~/Desktop/Numerai/numerai dataset/numerai_datasets (13)/numerai $ python prep_data.py /home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Fold #1 Traceback (most recent call last): File "prep_data.py", line 85, in main() File "prep_data.py", line 50, in main rf.fit(X_split_train, y_split_train) File "/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py", line 247, in fit X = check_array(X, accept_sparse="csc", dtype=DTYPE) File "/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: 'test'

Many thanks for your help, Andrew

GillesVandewiele commented 6 years ago

The data format has changed since last year. There are some columns that need to be dropped.

I used this in tournament 72: feature_cols = ['feature'+str(i) for i in range(1, 22)]

jimfleming commented 6 years ago

Yes, this code is pretty out of date now. I may update in the future as time allows.

GillesVandewiele commented 6 years ago

Hey @jimfleming I adapted parts of your code to work with the current format. I'll try sending a PR in the nearby future!