learn-co-curriculum / dsc-regression-model-validation

Other
4 stars 275 forks source link

fit_transform used on test data #4

Closed erankova closed 8 months ago

erankova commented 8 months ago

Link to Canvas

https://learning.flatironschool.com/courses/7215/assignments/268121?module_item_id=644625

Issue Subtype

Describe the Issue

.fit_transform() used on test data while instructions are saying it's being done on training data

Source

# Transform testing set
X_test_cat = pd.DataFrame(ohe.transform(X_test[cat_columns]),
                           columns=cat_columns, index=X_test.index)

# Fill missing values with the string 'missing'
X_test_cat.fillna(value='missing', inplace=True)

# Transform training set
X_test_ohe = pd.DataFrame(ohe.fit_transform(X_test_cat),
                           columns=cat_columns, index=X_test.index)

Concern

We learn to never fit test data so not sure what the lesson is trying to portray.

(Optional) Proposed Solution

What OS Are You Using?

Any Additional Context?

bpurdy-ds commented 8 months ago

Thank you for flagging this. We'll take a look.

danielburdeno commented 8 months ago

Fixed and updated the onehotencoding. Thanks again for flagging!