Open henryliangt opened 2 years ago
numerator / denominator Remedy: add 1 to the numerator and m to the denominator (m - number of attribute values = 3 for outlook) • This is called Laplace correction or smoothing
There is a generalization of the Laplace correction called m-estimate
拉普拉斯校正有一个推广,称为 m-estimate
During classification: missing value in the new example
During training: • do not include the missing values in the counts • calculate the probabilities based on the actual number of training examples without missing values for each attribute
How to calculate the probabilities for the numeric attributes? Answer: assume that the numeric attributes follow a normal (or Gaussian) distribution and use the probability density function
Probability density function for a normal distribution with mean $mu$ and standard deviation $sigma$ :
other probability density functions, e.g. Poisson, binomial, gamma
• Holdout method • Cross validation • Leave-one-out cross validation • Cross-validation for parameter tuning
parameter that can be tuned to optimize the performance of a ML algorithm • in k-nearest neighbor algorithm: k • in neural networks: number of hidden layers and nodes in them; number of training epochs, etc.
Holdout method hyperparameters 超参数 stratification 分层 This is called repeated holdout method Stratified 10-fold cross-validation – this is a standard method for evaluation used in ML • each subset is stratified
A special form of n-fold cross-validation • Set the number of folds to the number of training examples • => for n training examples, build classifier n times
Grid-search with cross-validation for parameter tuning sklearn, we can use GridSearchCV t
The confusion matrix is not a performance measure, it allows us to calculate performance measures
precision (P), recall (R) F1 score classification
Bayes Theorem
posteriori probability / conditional probability
P(H|E) is the probability of the hypothesis (i.e. that E is a rose), given that we have seen that E is red and long
prior probability of H
P(H)
P(E|H) is the probability that E is red and long, given that we know that E is a rose • Called posteriori/conditional probability of E given H
P(E) is the probability that any given example (flower) is red and long, regardless of the hypothesis
prior probability of E; it is independent of H
• Unrealistic assumptions – almost never correct • => that’s why the algorithm is called Naive Bayes