Open lucasmonstrox opened 5 years ago
I came across this and I think I have fixed it. in the previous part (preparing the data section). When they pull from the 'final_dataset.csv', this csv has way more columns than they did in the tutorial (if you took the data from RudrakshTuwani github repo like I did). Therefore you have to manually state which columns we want in our csv. So when selecting the feature and target variables, instead of using:
X_all = data.drop(['FTR'],1),
I used this:
X_all = data[['HTP', 'ATP', 'HM1', 'HM2', 'HM3', 'AM1', 'AM2', 'AM3', 'HTGD', 'ATGD', 'DiffFormPts', 'DiffLP']].copy()
It gives you 30 feature columns instead of a few thousand but you can then drop out the other added ones by using the following in the preprocessing section:
X_all = X_all.drop(['HM1_M', 'HM2_M', 'HM3_M', 'AM1_M', 'AM2_M', 'AM3_M'], axis = 1).
This then gives you the 24 feature columns they have to do the train test split and model evaluation later on. Hope this helps!
For some reason Im having like more than 1k of features.
Can you help me?