henryliangt / usyd

0 stars 0 forks source link

5318 W4 #6

Open henryliangt opened 2 years ago

henryliangt commented 2 years ago

Bayes Theorem

posteriori probability / conditional probability

P(H|E) is the probability of the hypothesis (i.e. that E is a rose), given that we have seen that E is red and long

prior probability of H

P(H)

P(E|H) is the probability that E is red and long, given that we know that E is a rose • Called posteriori/conditional probability of E given H

P(E) is the probability that any given example (flower) is red and long, regardless of the hypothesis

prior probability of E; it is independent of H

• Unrealistic assumptions – almost never correct • => that’s why the algorithm is called Naive Bayes

henryliangt commented 2 years ago

numerator / denominator Remedy: add 1 to the numerator and m to the denominator (m - number of attribute values = 3 for outlook) • This is called Laplace correction or smoothing

There is a generalization of the Laplace correction called m-estimate

拉普拉斯校正有一个推广,称为 m-estimate

henryliangt commented 2 years ago

During classification: missing value in the new example

During training: • do not include the missing values in the counts • calculate the probabilities based on the actual number of training examples without missing values for each attribute

henryliangt commented 2 years ago

Naïve Bayes for Numeric Attributes

How to calculate the probabilities for the numeric attributes? Answer: assume that the numeric attributes follow a normal (or Gaussian) distribution and use the probability density function

Probability density function for a normal distribution with mean $mu$ and standard deviation $sigma$ :

image

henryliangt commented 2 years ago

other probability density functions, e.g. Poisson, binomial, gamma

henryliangt commented 2 years ago

Evaluating Machine Learning Algorithms

Evaluation Procedures

• Holdout method • Cross validation • Leave-one-out cross validation • Cross-validation for parameter tuning

parameter that can be tuned to optimize the performance of a ML algorithm • in k-nearest neighbor algorithm: k • in neural networks: number of hidden layers and nodes in them; number of training epochs, etc.

Holdout method hyperparameters 超参数 stratification 分层 This is called repeated holdout method Stratified 10-fold cross-validation – this is a standard method for evaluation used in ML • each subset is stratified

Leave-one-out cross-validation ?

A special form of n-fold cross-validation • Set the number of folds to the number of training examples • => for n training examples, build classifier n times

Grid-search with cross-validation for parameter tuning sklearn, we can use GridSearchCV t

Performance measures?

Confusion matrix

Recall, precision and F1 score

The confusion matrix is not a performance measure, it allows us to calculate performance measures

precision (P), recall (R) F1 score classification