hil-se / fds

DSCI-633: Foundations of Data Science https://github.com/hil-se/fds
MIT License
26 stars 10 forks source link

Correction for Assignment 2 #106

Closed azhe825 closed 1 year ago

azhe825 commented 2 years ago

Do not consider '?' as missing values when implementing my_NB.py.

Treat that as one of the possible values X can take.

This should make Assignment 2 easier and last time I checked, sklearn.naive_bayes.CategoricalNB also does not consider missing values.

rohaan2614 commented 2 years ago

The "?" does not provide useful information about the features. Wouldn't this introduce an error into the model?

azhe825 commented 2 years ago

@rohaan2614 You are right. “?” Should be treated as missing values in this dataset. However, this assignment tries to match the categorical naive bayes classifier on sklearn, which does not have a missing value handling mechanism.

rohaan2614 commented 2 years ago

Can we remove the rows that contain "?" before calculating the probabilities? Also, I noticed there are only 200 records (including "?" rows). Is this sufficient for reliable probabilities?

On Mon, 5 Sept 2022 at 22:33, Zhe Yu @.***> wrote:

@rohaan2614 https://github.com/rohaan2614 You are right. “?” Should be treated as missing values in this dataset. However, this assignment tries to match the categorical naive bayes classifier on sklearn, which does not have a missing value handling mechanism.

— Reply to this email directly, view it on GitHub https://github.com/hil-se/fds/issues/106#issuecomment-1237594232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGKAIEYHNNP3HIEKBPKNXMDV42UQLANCNFSM6AAAAAAQDQVT6M . You are receiving this because you were mentioned.Message ID: @.***>

azhe825 commented 2 years ago

@rohaan2614 No. The goal of this assignment is not to predict as accurate as you can. But to implement the naive bayes classifier. Do not modify the A2.py file.