Closed azhe825 closed 1 year ago
The "?" does not provide useful information about the features. Wouldn't this introduce an error into the model?
@rohaan2614 You are right. “?” Should be treated as missing values in this dataset. However, this assignment tries to match the categorical naive bayes classifier on sklearn, which does not have a missing value handling mechanism.
Can we remove the rows that contain "?" before calculating the probabilities? Also, I noticed there are only 200 records (including "?" rows). Is this sufficient for reliable probabilities?
On Mon, 5 Sept 2022 at 22:33, Zhe Yu @.***> wrote:
@rohaan2614 https://github.com/rohaan2614 You are right. “?” Should be treated as missing values in this dataset. However, this assignment tries to match the categorical naive bayes classifier on sklearn, which does not have a missing value handling mechanism.
— Reply to this email directly, view it on GitHub https://github.com/hil-se/fds/issues/106#issuecomment-1237594232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGKAIEYHNNP3HIEKBPKNXMDV42UQLANCNFSM6AAAAAAQDQVT6M . You are receiving this because you were mentioned.Message ID: @.***>
@rohaan2614 No. The goal of this assignment is not to predict as accurate as you can. But to implement the naive bayes classifier. Do not modify the A2.py file.
Do not consider '?' as missing values when implementing my_NB.py.
Treat that as one of the possible values X can take.
This should make Assignment 2 easier and last time I checked, sklearn.naive_bayes.CategoricalNB also does not consider missing values.