prathameshtari / Predicting-Football-Match-Outcome-using-Machine-Learning

Football Match prediction using machine learning algorithms in jupyter notebook
91 stars 63 forks source link

Random Forest Classifier gives 100% accuracy #2

Open Abhishek2019 opened 5 years ago

Abhishek2019 commented 5 years ago

I applied the random forest algorithm on merged_dataset.csv. Out of 6080 rows, I used 80% rows for training and the remaining 20% for testing. I found that the trained model predicted target with 100% accuracy. I take attribute FTR as a target.

CODE :

`from sklearn.preprocessing import LabelEncoder import pandas as pd from sklearn.model_selection import train_test_split import numpy as np from sklearn.ensemble import RandomForestClassifier

dataframe = pd.read_csv('./dataset/Merged_dataset.csv') print(dataframe.head())

df = dataframe.apply(LabelEncoder().fit_transform) print(df.head())

target = np.array(df['FTR']) features= df.drop(['id','FTR','FTAG','FTHG'], axis = 1) features = np.array(features)

Split the data into training and testing sets

train_features, test_features, train_labels, test_labels = train_test_split(features, target, test_size = 0.20, random_state = 42)

model = RandomForestClassifier() model.fit(train_features, train_labels)

predicted_labels = model.predict(test_features)

print("actual Test labels") print(test_labels) print("") print("predicted test labels") print(predicted_labels)

calculate accuracy

count = 0 totalCount = len(predicted_labels) for i in range(len(test_labels)): if(predicted_labels[i] == test_labels[i]): count = count+1

print("Accuracy : "+str((count/totalCount)*100)+" %") `

OUTPUT :

image

isn't it too unreal to have 100% percent accuracy? If I applied Logistic Regression then model's accuracy is 68%

will you correct what is wrong in my code? or what is the concept that I am missing while training my model?

systats commented 5 years ago

Well it seems you guys are trying to predict END Results with END Features. You somehow have to come up with PRE-game data features, otherwise you will never gain an edge over the bookies. You can generate pre game features with rankings (elo, glicko, poisson or negbin models). And finally tune your model not on accuracy but on profitability (+betting strategy). Good luck.

10511829 commented 4 years ago

hii broo..i cannot get proper table.team column. the column shows all true values instead of each team name what should i do?? Plzz help me. i attached screenshot below.

Screenshot (14)