ucrayca / Aycaucr

0 stars 0 forks source link

Aycaucr

RULE datasetinin referans çalışması: blog

bu haftalık blog yazıları: link

Giriş

1.1 Introduction Statistics

Descriptive Statistics

1- Measures of central tendency

2- Measures of variability (spread)

3- Skewness " positive - negative - undefined "

4- Correlation : [ -1.0 , +1.0] // grafik eğimi positive ise r>0 ; negative ise r<0 ; eğim 0 ise r=0

Statistical Distributions

1- Continuous : A number within a range of values, usually measured

Uniform : a type of probability distribution in which all outcomes are equally likely; each variable has the same probability that it will be the outcome. continuous random variable x has a uniform distribution, denoted U(a, b), if its probability density function is:

  f(x) = 1 / (b−a)

Nominal : the specific form of the normal distribution depends on 2 parameters: the expectation (µ) and variance (σ^2) - N(m,σ ^2)

2- Discrete : Only take certain values (can’t be decimal), usually counted

Bernoulli (binomial) : the upcoming event (positive outcome) is also called "success." the ratio of positive outcomes to the total number of tests tends to the probability of the occurrence of this event.

f(x) = (n x) p ^ x(1−p) ^ (n−x)

    n = the number of experiments in the series
    x = a random variable (the number of occurrences of event A)
    p^x = the probability that A happens exactly m times
    q = 1 - p (the probability that A does not appear in the test)

Poisson : is obtained as a limiting case of the Bernoulli distribution, if we push p to zero and n to infinity, but so that their product remains constant: np = a.

f(x) = [(e ^−λ) * (λ^x ) ] / x!

The average number of events in an interval is designated λ.
λ = the event rate also called the rate parameter. It is also equal to mean and variance.

Descriptive Analysis

Helps you detect outliers and typos, and enable you identify associations among variables, thus preparing you for conducting further statistical analyses.

There are two types

  1. Descriptive analysis for each individual variable

  2. Descriptive analysis for combinations of variables

Variable can be classified into quantitative and categorical

  1. Quantitative variables : represent quantities or numerical values

  2. Categorical variables : describe quality or characteristics of individuals

1.2 Introduction Machine Learning

CLUSTERING

K-means clustering algorithm : It is the simplest unsupervised learning algorithm that solves clustering problem. K-means algorithm partition n observations into k clusters where each observation belongs to the cluster with the nearest mean serving as a prototype of the cluster .

               1. Specify the desired number of clusters K
               2. Randomly assign each data point to a cluster 
               3. Compute cluster centroids 
               4. Re-compute cluster centroids 
               5. Repeat steps 4 and 5 until no improvements are possible 

CLASSIFICATION

Naive Bayes

Naive Bayes is a probabilistic classifier in Machine Learning which is built on the principle of Bayes theorem. Naive Bayes classifier makes an assumption that one particular feature in a class is unrelated to any other feature and that is why it is known as naive.

Decision Tree

Decision tree, as the name states, is a tree-based classifier in Machine Learning. You can consider it to be an upside-down tree, where each node splits into its children based on a condition.

Logistic Regression

Logistic regression is a binary classification algorithm which gives out the probability for something to be true or false.

Support Vector Machine (SVM)