prof-rossetti / intro-to-python

An Introduction to Programming in Python
Other
97 stars 244 forks source link

Predictive Statistics Exercise #21

Open s2t2 opened 4 years ago

s2t2 commented 4 years ago

The monthly sales predictions exercise in unit 5B is a little weak, and should be replaced with something else, like this titanic kaggle competition, which is a lot more fun and instructive:

https://www.kaggle.com/c/titanic https://www.kaggle.com/c/titanic/data

s2t2 commented 3 years ago

Use material from this new analytics course: https://github.com/prof-rossetti/data-analytics-in-python

s2t2 commented 3 years ago

Have some example notebooks here:

https://drive.google.com/drive/folders/1pQVM5bq0ykGuXF_JRfvEDFbdd95kPISv?usp=sharing

and slides here:

https://docs.google.com/presentation/d/13fYiA3E5yADSLlScBuL5lbnWY0hKrkqstrEk1DyFkM4/edit?usp=sharing

s2t2 commented 3 years ago

Split the data, as necessary:

from sklearn.model_selection import train_test_split

df_train, df_test = train_test_split(df, test_size=0.2, random_state=99)

feature_cols = ["feature col a", "feature col b", "feature col c"] 
target_col = "labels col" 

x_train = df_train[feature_cols]
y_train = df_train[feature_cols]

x_test = df_test[target_col]
y_test = df_test[target_col]

Choose a model (and corresponding metrics):

from sklearn.linear_model import  LinearRegression
from sklearn.metrics import r2_score #, mean_absolute_error, mean_squared_error

model = LinearRegression()

Train the model:

model.fit(x_train, y_train) 

Score the model on training data:

y_train_pred = model.predict(x_train)
print("R^2 SCORE:", r2_score(y_train, y_train_pred))

Score the model on test data:

y_test_pred = model.predict(x_test)
print("R^2 SCORE:", r2_score(y_test, y_test_pred))