solomonxie / blog-in-the-issues

A personalised tech-blog, notebook, diary, presentation and introduction.
https://solomonxie.github.io
65 stars 12 forks source link

Statistical Guessing 统计式瞎猜 #50

Open solomonxie opened 6 years ago

solomonxie commented 6 years ago

Statistics is all about PREDICTION: Given some real information, and predict what will happen next.

Study Resources

Tools

Khan academy AP Statistics

Machine Learning related topics

solomonxie commented 5 years ago

Benford's Law

It's also called Newcomb-Benford's Law, Law of Anomalous Numbers, and First-Digit Law.

Refer to wiki: Benford's law

It is an observation about the frequency distribution of leading digits in many real-life sets of numerical data.

The first digits of data entries in most real-world data sets are not uniformly distributed. The most common first digit is 1, followed by 2, and so on, with 9 being the least common first digit. This phenomenon is known as Benford's Law.

image

The leading digits in such a set thus have the following distribution: image

solomonxie commented 5 years ago

Two-way Tables (Joint Distributions)

Refer to Khan academy: Two-way tables Refer to Khan academy: Distributions in two-way tables Refer to Khan academy: Marginal distribution and conditional distribution

Refer to Mathbitsnotebook: Two-Way Frequency Tables

image

Definitions

Two-way Table

Two-way Table is a Joint distribution, which rows represent a kind of distribution, columns represent another kind of distribution.

Marginal Distribution

Marginal Distribution is simply an addon to the joint distribution, that as a TOTAL row or column at the margins.

image

Conditional Distribution

Conditional Distribution is one column(variable) in condition of another variable.

image

Trends in categorical data

Refer to Khan academy: Analyzing trends in categorical data Refer to Khan academy: Filling out frequency table for independent events

▶ Practice on Khan academy: Trends in categorical data

image

Interpret the table:

Example

image Solve:

Example

image Solve:

Example

image Solve: image

solomonxie commented 5 years ago

Frequency Table & Dot plot

Refer to Khan academy: Frequency tables & dot plots Refer to Khan academy review: Dot plots and frequency tables review

image

image

image

solomonxie commented 5 years ago

❖ Central Tendencies: Mean, Median, Mode

Which could represent the centres of a distribution.

Refer to youtube: Mean, Median, and Mode: Measures of Central Tendency: Crash Course Statistics #3 Refer to wikipedia: Central tendency Refer to Khan lecture.

image

image

Impact on median & mean

There're some common impact:

Average

Average in statistics means bit different than just a arithmetic average.

Khan lecture.

Average: In stats, it means typical or middle, and could be represented by multiple ways:

solomonxie commented 5 years ago

❖ Quartiles and Box plots (Distribution graph)

It's also called Box and whisker plots, or Five-number summary.

▶︎ Jump over to Khan academy for practice: Comparing data distributions

Refer to Khan academy: Reading box plots Refer to Khan academy: Interpreting box plots Refer to Maths is for fun: Quartiles

Quartiles are the values that divide a list of numbers into quarters:

image

or

image

「Interquartile range」IQR (Box plot)

Refer to Khan academy.

The Interquartile Range is from Q1 to Q3:

image

Example

image

「Five-number Summary」

Refer to Khan academy: Five-number summary

Example

image

「Box and Whisker Plot」

Box and Whisker Plot can show all the important values.

Important values:

image

image

Find out the 「Mean」 in Box plot

Although we can't find out the mean value from the Box Plot. But according to the position of the Q2 (the Median), we could know the relationship between the Mean & Median:

image

Example

image

Example

At this graph below, according to the Q2 position, we know that the distribution shape is Skewed right

image

Practice

image

Practice

image

solomonxie commented 5 years ago

「Variance」 Deviation

In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean.

image

「Standard Deviation」

Also called Standard Variance.

Refer to Khan academy review: Calculating standard deviation step by step

image (▲ where means "sum of", x is each value in the data set, μ(mu) is the mean of the data set, and N is the amount of data points in the population.)

Steps:

Example

image Solve:

solomonxie commented 5 years ago

Sample Variance

The Sample Variance, , is used to calculate how varied a sample is, and it's useful to estimate the Population Variance.

Since the Sample Variance is kind of estimation, so its formula is bit different.

image

Why do we need to divide by n-1?

Refer to Quora: Why is the formula of sample variance different from population variance?

"The sample variance is an estimator for the population variance. When applied to sample data, the population variance formula is a biased estimator of the population variance: it tends to UNDERESTIMATE the amount of variability. "

For solving this Underestimation problem, the statisticians found out that by dividing n-1 we will solve this problem, regards to the idea of degrees of freedom (DF).

image

Easy way to calculate Sample Variance

This formula is better for handwriting calculation: image

Sample Standard Deviation

image

Example

image Solve: image

The age of any gorilla in our sample is likely to be closer to the average of the 4 gorillas we looked at instead of the average of all the gorillas in the zoo. Because of that, the squared deviations from the mean we calculated will probably underestimate the actual deviations from the population mean. To compensate for this underestimation, rather than simply averaging the squared deviations from the mean, we total them and divide by n-1.

solomonxie commented 5 years ago

「Mean Absolute Deviation」 (MAD)

The Mean absolute deviation is the absolute average of all deviations.

The deviation is the distance from the value to the mean value. It's used to describe how the values looks like or how they're laid on the axis, are they close to each other or far away.

Khan lectures.

image

solomonxie commented 5 years ago

❖ Intro to Probability [DRAFT]

It's easy but always confusing if you haven't yet totally understood it in the first place.

image

The very first thing to do for solving a probability problem, is to CATEGORISE the problem and apply different formula.

「Single Event」

image

Common cases:

「Theoretical Probabilities」 & 「Experimental Probabilities」

The formula Fav outcomes / Total outcomes only gives you the Theoretical probability. But when you do some experiments, like flip a coin 10,000 times, and you may find out the probability of the result of experiments is way so different than the theoretical one.

「Single Event Repeats」

Example: Roll a die 100 times, how many times will you get a number greater than 3? Answer: P(>3) = 3/6 *100 The probability is 50 times.

「Multiple Events」

「Independent events」 in sequence

「Independency」

To understand probability, we really need to differentiate independent events and dependent events.

Khan lecture: Compound probability of independent events.

Coin flips are INDEPENDENT events: What happens in the first flip in no way affects what happens in the second flip.

And this is actually one thing that many people don't realise.

「Gambler's Fallacy」

There's someone who thinks, if he got a bunch of heads in a row, then all of a sudden, it becomes more likely on the next flip to get a tails.

THAT IS NOT THE CASE.

Every flip is an independent event. What happened in the past in these flips does not affect the probabilities going forward.

「Sample Space」

A dummy method, just to draw a table or a tree shows every outcome it could be, and pick out all favourable results.

Refer to Wiki: Sample Space Refer to article: Sample Space Examples and The Counting Principle

The sample space of an experiment is all the possible outcomes for that experiment.

image (Rolling Two dice)

image (52 card deck)

image

「Sample Size」

It's also called the Size of Sample Space.

Simply to MULTIPLY.

The Fundamental Counting Principle: If there are a ways for one event to happen, and b ways for a second event to happen, then there are a * b ways for both events to happen.

Sample problem: If shoes come in 6 styles with 3 possible colors, how many varieties of shoes are there? All you need to do is multiply: 6 • 3 = 18 possible varieties of shoes.

Example:

image First to notice that, it's ONE event.

Example:

image

Example:

image

solomonxie commented 5 years ago

❖ 「Permutations」 & 「Combinations」

Aside from probability, Permutations and Combinations are essential tools for statistics.

They're to solve the problem: how many groups are there of if we choose some from some.

▶︎ Back to previous note on: Intro to probability.

▶︎ Omni Permutation Calculator ▶︎ Omni Combination Calculator

Refer to article: Easy Permutations and Combinations Refer to article: Permutations And Combinations Simplified Refer to article: Combinations vs Permutations

HOW MANY groups do we get if we choose a number things from the total things? e.g., how many groups would there be if we choose 3 people from 9 people?

Permutations and combinations are both to count the total number of groups. We got TWO types of ways to count:

Combinations could be seen as FILTERED permutations, which filtered out all the "duplicates", or "over counted items".

e.g., We got different groups(Permutations) as "123, 132, 231, 213, 312, 321", once we filter out the over counted items, the combination is just one: 123.

「Permutations」

It's all the possible ways to arrange/order elements in a list.

Notation

image (Read as N pick K)

Understanding 「Permutations」

Notice: possibilitiesprobabilities

e.g., the possibilities of how to arrange three numbers 1,2,3? It could be: 123, 132, 231, 213, 312, 321, so answer is 6 possible ways. To count that algebraically, it'd be 3*2*1, answer is 6 possible ways.

How do we do this?

Possible ways to fit in the 1st position are 3, and we got 2 left overs. Then the 2nd place could have 2 possible ways, and we got 1 left over. So the 3rd position could be 1 possible way.

And just to logically think about it, we should MULTIPLY them together to get ALL POSSIBLE WAYS: 3*2*1.

Formula

「Combinations」

Combination is a collection of elements which the order DOESN'T matter.

Based on permutations, we filter out the same combinations by dividing k! to get the combinations.

Notation

image (Read as N choose R)

Formula

image

solomonxie commented 5 years ago

❖ 「Set」 Basics

Refer to wiki: Set Refer to Khan academy: Basic set operations

image

「Membership」

If B is a set and x is one of the objects of B, this is denoted x ∈ B, and is read as "x belongs to B", or "x is an element of B". If y is not a member of B then this is written as y ∉ B, and is read as "y does not belong to B".

「Subsets」

If every member of set A is also a member of set B, then A is said to be a subset of B, written A ⊆ B (also pronounced A is contained in B). Equivalently, we can write B ⊇ A, read as B is a superset of A, B includes A, or B contains A.

「Empty Set」 ∅

The empty set is a subset of every set and every set is a subset of itself:

「Universal Set」 U

Every set is a subset of the universal set: A ⊆ U.

Basic Set Operations

image

「Intersection」 ⋂, &, and

Examples:

Basic properties of intersections:

「Union」 ⋃, |, or

Examples:

Basic properties of unions: A ∪ B = B ∪ A. A ∪ (B ∪ C) = (A ∪ B) ∪ C. A ⊆ (A ∪ B). A ∪ A = A. A ∪ U = U. A ∪ ∅ = A. A ⊆ B if and only if A ∪ B = B.

「Complements」 \, -, subtract

Two sets can also be "subtracted". The relative complement of B in A (also called the set-theoretic difference of A and B), denoted by A \ B (or A − B), is the set of all elements that are members of A but not members of B.

Examples:

Basic properties of complements:

Example

image Solve: image

solomonxie commented 5 years ago

Histogram

Refer to Khan academy: Creating a histogram

Instead of plotting dots, Histogram put data of categories into BUCKETs.

image

「Relative Frequency Histogram」

Instead of pointing out each category's absolute value, sometime we need it better with each category's percentage, which Relative Frequency will solve the problem.

image

solomonxie commented 5 years ago

Stem & Leaf Plot

Refer to Khan academy review: Stem and leaf plots review

Both Stem and Leaf columns represents the digits (or the place) of numbers.

In the case below, stem shows the tenth place digit, and leaf shows the ones place digit.

image

image

solomonxie commented 5 years ago

❖ Describing Distributions

Refer to Khan academy: Example: Describing a distribution

image

「Shapes」: Normal, Left Skewed, Right Skewed

Refer to Khan academy: Classifying shapes of distributions

image

Example

image

「Spread」: Range, IQR, Standard Deviation, MAD

Refer to Crash course: Measures of Spread: Crash Course Statistics #4

「Centres」: Mean, Median, Mode

Refer to Crash course: Mean, Median, and Mode: Measures of Central Tendency: Crash Course Statistics #3

image

「Outliers」

Refer to Khan academy: Judging outliers in a dataset

In statistics, an outlier is an observation point that is distant from other observations. That being said, outliers in a graph are the MINORITY of the values.

image

Statistical definition 「1.5·IQR Rule」

Outliers are the value fall out of the Fence, which the Upper fence and Lower fence are: image

image

How to choose proper methods

We got different ways to describe the spread, centre and deviation, so we need some strategy to decide which one to use in different cases.

solomonxie commented 5 years ago

❖ 「Clusters」, 「Outliers」, 「Gaps」, 「Peaks」

Khan lecture: Shape for distributions. Khan lecture 2 Clusters, gaps, peaks & outliers.

image

image

solomonxie commented 5 years ago

Sample Variance

It's also called the Unbiased estimate of population variance.

Refer to Khan academy: Sample variance

For a large population, it's impossible to get all data. So we want to take out a number samples and calculate its variance.

The formula for Sample Variance is a bit twist to the population variance: let the dividing number subtract by 1, so that the variance will be slightly bigger.

It seems like some voodoo, but it's reasonable. If we use the population variance formula for sample data, it's always gonna be underestimated. That's why for sample variance we should do a bit change to the previous one.

image

Why we divide by n-1 for the Unbiased Sample Variance

Refer to Khan academy: Review and intuition why we divide by n-1 for the unbiased sample variance Refer to Khan academy: Why we divide by n-1 in variance Refer to Khan academy: Simulation showing bias in sample variance Refer to Khan academy simulation: Unbiased Estimate of Population Variance

image

Simulation for different variance formulas with true variance:

image

solomonxie commented 5 years ago

❖ Percentiles

Before start you probably need to know: explanations of percentiles are quite confusing and different from each teacher teaches and different at each website you searched. Because there is NO standard definition of percentile.

image

Percentiles tell you what PERCENTAGE of the population has a value that's LOWER than yours.

▶︎ Jump over to have practice: Calculating percentiles

Refer to Khan academy: Calculating percentile Refer to youtube: Percentiles - Introductory Statistics Refer to youtube: Percentile Refer to textbook [PDF]: PERCENTILES AND PERCENTILE RANKS Refer to wikipedia: Percentile Refer to wikipedia: Percentile rank Refer to mathisfun: Percentiles Refer to pbarrett: percentiles (PDF) Refer to varsity tutors: percentiles

A percentile is all values BELOW the given percentage. etc., the 20th percentile is all values below which 20% of the observations may be found.

Percentiles are numbers from 1st to 100th, which 100th percentile means the largest value in the set. According to wiki, there COULD be decimal percentiles such as 0.13th percentile, 2.28th percentile.

For example, if your doctor tells you: your height is AT the 83% percentile of population, it means there's 83% of people are shorter than or equal to yours:

image

Other names of Percentiles

Interquartile:

Deciles: Deciles are percentiles divided into 10 equal sections, which correspond to the 10th, 20th, 30th,...90th percentiles.

image

「Percentile Rank」

Percentile rank is usually in a context of asking you to find a given value is at which percentile. i.e., Percentile ranks are commonly used to clarify the interpretation of scores on standardized tests.

etc., you're asked what is the percentile rank of number 79 in a list, and the answer might be "Its rank is 90, because it's at the 90th percentile."

「Percentile」 vs. 「Percentile Rank」

Percentiles and Percentile Ranks are highly similar(confusing) statistics.

Example

image

「Calculate Percentiles」

The process of calculating percentiles, is actually manipulating the indexes of the number list. It's like calculating the pointer, finding out the right pointer will lead you to the number, regardless to what number it is.

There're a few methods for calculating percentiles:

Formula

image (Index is the value at given percentile, which , P is the percentile, Amount is the number of values in the list) For cut down confusion, we use index instead of Rank from textbooks, which regards to the "ordinal rank" not "percentile rank".

Example

There's a 12 numbers list, {a,b,c,d,e,f,g,h,i,j,k,l} then 80th percentile relates to 80% of the AMOUNT of the list, then it's 80% × 12 = 9.6 , which 9.6 is the index of the number in list. But the index must be a whole number, so according to the definition of percentile, the number must be equal or above 80% of all values, that's being said, the index of number is higher than "9.6", which is the 10th number in list. So the 10th number in list is AT the 80th percentile, regardless what number it is.

Example

Consider the ordered list {15, 20, 35, 40, 50}, which contains 5 data values. What are the 5th, 30th, 40th and 100th percentiles of this list using the nearest-rank method? Refer to wiki: Worked examples of the nearest-rank method

Solve:

Example

image

Calculate 「Percentile ranks」

We use the same formula from calculating percentiles: image

Instead of input the percentile to get the index, we are to input the index and get the percentile rank.

Example

If the scores of a set of students in a math test are {20 , 30 , 15, 75}. What is the percentile rank of the score 30 ?

Solve:

image

So the Percentile rank for number 30 is 75, which means it's at 75th percentile.

「Cumulative Relative Frequency Graph」

Refer to Khan academy: Analyzing a cumulative relative frequency graph

image

solomonxie commented 5 years ago

❖ 「Z-score」 (Standard Score / Normal Score)

Z stands for Standard Normal Distribution. It's fairly important in real life: Japan use Z-score on exam to estimate each student's study skills.

Z-score is the essential concept of Z-Statistics.

▶︎ Jump over to have practice: Comparing with z-scores

Refer to Wiki: Standard score Refer to Khan academy: Z-score introduction Refer to youtube: Why Do We Need z Scores Refer to youtube: Statistics 101: Understanding Z-scores Refer to Crash Course: Z-Scores and Percentiles: Crash Course Statistics #18 Refer to youtube: z-score Calculations & Percentiles in a Normal Distribution

Z-score is all about comparison: compare different kind of data set. In another word, Z-score indicates How many standard deviations away (above or below) from the mean to the given point.

image

Why do we need Z-scores

"Z-scores in general allow us to compare things that are NOT in the same scale, as long as they are NORMALLY distributed." - CrashCourse

For example, although we know everyone's score, but by only watching those scores it's hard to know how good he is or how bad he is compare to anyone else in the dataset. etc., if most of the students score above 90, can we say someone scores 90 is good?

So Z-score gives a solution for this: compare the score to the "average".

Z-score is especially good to compare different type of data, etc., compare 100-score exam & 150-score exam, compare IELTS & TOFEL, compare apples & oranges, compare a baseball player & football player....

All in all, Z-score is a process of Normalization, which "normalize" different set of data to same standard and compare.

Compares the various grading methods in a normal distribution:

image

How to understand the formula?

image

With comparing each one's score with the mean: x - μ, we will get a kind of deviation.

But at this point we still don't know whether each one's deviation is big or small. We need a "standard" to compare each deviation. Just like the mean is the average of all scores, standard deviation is the average amount of deviation of all scores, which will tell us each deviation is large or not. So we want to compare each deviation with the Standard deviation: deviation ÷ 𝜎

And we get the whole picture: Standard Score = (𝓍 - μ) / 𝜎

How to understand the Number of Standard Deviations?

Assume the standard deviation is 𝜎(sigma), so the number of it just means how much it is scaled. etc., 2𝜎 means a doubled standard deviation, and 1.5𝜎 means 1.5 times larger SD. If your Z-score is 2𝜎, it means your score is doubled standard deviation away from the mean.

Example

There's some exam data of a class:

image

Here's their z-scores:

image

Example

image Solve:

「Z-table」 Convert Z-score to Percentile

This ONLY applies to Normal Distribution

Refer to Khan academy: Standard normal table for proportion below

If you know someone's z-score, you will easily get his percentile from the Z-table. Vice versa, if you know his percentile, you can get his z-score as well.

How to use? The 1st Row represents the tenth decimal of the z-score, the 1st Column represents the hundredth decimal of the z-score. According to the given z-score, and search over the rows & columns to get the corresponded intersection, which is the percentile.

etc., Someone's z-sore is "0.57", and you want to know what percentile he's at, or what proportion is below his score. Just go over to the z-table, first get to the row at 0.5, and find the column of 0.7, and the intersection will be his percentile, which is "0.7157" or "71.57%" in this case.

image

Common values: image

Explicit Z-table: image

Example

image

Solve:

Example

image

Solve:

「Z-table」Convert Percentile to Z-score

Refer to Khan academy: Finding z-score for a percentile

Just do the other way around by looking for the given percentile cell and then read out the corresponded column & row, that will get you the z-score.

Example

image

Solve:

solomonxie commented 5 years ago

❖ Population Parameters [DRAFT]

「Population」

In statistics, the Population is the collection of all people, items, or objects that are required for a specific study.

「Parameter」

It's also called the Population parameter.

The word parameter in Statistics means different than in Mathematics. It is the number that describes the population. It is obtained from a statistic which is calculated from a randomly selected sample of the given population.

Common population parameters:

image

「parameter」 vs. 「statistic」

The word parameter often refers to the Population statistic, etc., population mean, population SD.

The word statistic although generally refers to a fact about the data, but it also often refers to the Sample statistic, etc., sample mean, sample proportion.

「Central Tendencies」

(To be written...)

How parameters change as data is shifted and scaled

Refer to Khan academy: How parameters change as data is shifted and scaled

image

We see that:

Example

image

Solve:

solomonxie commented 5 years ago

Density Curves

Some times histograms aren't good enough to visualize large amount of dataset. And Density Curve plot will solve the problem, as it can take on any value in a continuum, they're not just thrown into some buckets.

image

Axes (Tricky):

Area

The entire AREA under the curve is 100%, which represent all the data points.

The percentage of a interval of data points, is the AREA under the curve over the interval. NOT the height of a point.

Parameters of Density Curves

Median

For Symmetric distribution, the Median is right at the middle, which is at 50th percentile.

For Skewed distribution, the [Left side area] = [Right side area] = 50%.

Mean

For Symmetric distribution, the Mean is right at the middle:

image

For Skewed distribution, the Mean is at the right or left of the Median:

image

Example

image What is the height of median?

Solve:

Example

Solve:

Example

image

Solve:

Example

image

Solve:

solomonxie commented 5 years ago

Empirical Rule 「68-95-99.7 Rule」

This rule ONLY applies to Normal Distribution.

It's also called the 68-95-99.7% rule, because for a normal distribution:

image

Example

image

Solve:

Example

image

Solve:

solomonxie commented 5 years ago

Scatter Plot

Just to plot many dots on the X-Y plane.

image

Linear & Non-linear Relationship

If you can fit a LINE through those points, it's linear relationship. If not, then it's non-linear.

Positive & Negative Relationship

If the scatterplot has a linear relationship:

「Bivariate relationship」Linearity, Strength and Direction

Bivariate is just a fancy way to say: For analyzing each point in X-Y plane, we analyze x & y SEPERATELY. etc., at point (2,3), including x-position is 2 and y-position 3, we analyze the x-values of all data-points, and then y-values of all data-points.

Refer to Khan academy: Bivariate relationship linearity, strength and direction

image

image

「Correlation Coefficient」

The Correlationis the SLOPE, and the coefficient of it is kind of adjustment to describe how well the slope fits the data. It's also kind of like a "Unit SLOPE" of the estimated Regression Line.

Refer to youtube: The Correlation Coefficient - Explained in Three Steps Refer to Khan academy: Correlation coefficient intuition Refer to Khan academy: Calculating correlation coefficient r

Correlation Coefficient is represented as letter r. The interval of r is -1 ≤ r ≤ 1:

image

image

Find out the Correlation Coefficient

image

Formula: image

solomonxie commented 5 years ago

❖ Least-square Regression

Least-square Regression is one way of calculating Linear Regression. Most regressions' calculations are done by computer, but we want to do that by hand to have better understanding.

What is Linear Regression? Trying to fit a line as closely as possible, and as many of points as possible, is called "Linear Regression".

Refer to Khan academy: Introduction to residuals and least-squares regression

image

「Residuals」

Residuals are errors. More specifically, they are the differences between the actual value of the response variable and the value predicted by the least squares regression line.

image

At a certain X-position, the value of residual is the VERTICAL DISTANCE from the actual value to the Regression Line.

image

The way that we calculate the Regression Line with Least Square method, is to MINIMIZE the square of residuals.

Example

image Solve:

Example

image Solve:

Calculate the equation of 「Least-square line」

▶ Practice at Khan academy: Calculating the equation of the least-squares line

Refer to Khan academy: Calculating the equation of a regression line

Formula of Regression line: image

  1. As we said the Correlation Coefficient r is kind like the Unit Slope which is between -1 to 1, so we have to apply the unit slope in real case by multiply r with the ratio of Standard Deviation of y & x, which is Sy/Sx.
  2. A "must go through point" is the MEAN of the dataset, which is: (Ẋ, Ẏ). At the mean, the residual = actual

With two informations above, we can easily calculate out the estimated Regression Line.

「Slope」 of Regression line

image

「Intercept」 of Regression line

Example

image Solve: image

solomonxie commented 5 years ago

Residual Plot

Refer to Khan academy: Residual plots

Linear model Residual plot: image

Non-linear model Residual plot: image

solomonxie commented 5 years ago

❖ 「R²」 Coefficient of Determinator

R-squared means Squared Residuals, which is the SE (Standard Error).

R squared is ALWAYS between 0 and 1, and the higher your R squared, the better.

Refer to Khan academy: R-squared or coefficient of determination

R squared is the variation of y that is explained by your linear model.

R-squared = Explained variation / Total variation

Formula

image (SE_line is Standard Error from line)

Understanding 「R-squared」

Refer to youtube: 3.2: Linear Regression with Ordinary Least Squares Part 1 - Intelligence and Learning

image

Why do we square "Residuals"?

It's just a way to keep those residuals (difference from the regression line) positive. And actually the residuals or squared residuals DOESN'T really matter to us, because we're to MINIMIZE them anyway. Take the minimum residual or minimum residual squared doesn't matter.

Why do we square 「Correlation Coefficient」?

(To do...)

Why do we add them together

By adding them we will get the TOTAL ERRORS, which is the one we're going to minimize.

「Root Mean Square Error」RMSE

It's also called the Root Mean Square Deviation (RMSD), or Standard Deviation of the Residuals.

This method is to measure the how good the Regression Line fits the data.

Refer to Khan academy: Standard deviation of residuals or root mean square deviation (RMSD)

image

solomonxie commented 5 years ago

Output of Least-square Regression

Refer to Khan academy: Using least squares regression output Refer to Khan academy: Confidence interval for the slope of a regression line

Prerequisite:

image

solomonxie commented 5 years ago

❖ Study design (Stats)

For different purposes, we're to use different methods of study.

Refer to Khan academy: Types of statistical studies

Types of Statistical Studies:

「Explanatory Variable」 & 「Response Variable」

Theresponse variable is the focus of a question in a study or experiment. An explanatory variable is one that explains changes in that variable. It can be anything that might affect the response variable.

「Samples」 or 「Surveys」

「Observational Studies」 or 「Experiments」

Notes

solomonxie commented 5 years ago
solomonxie commented 5 years ago

❖ Random Sampling

"Humans are famously bad at truly random." - Sal Khan

Refer to Khan academy: Techniques for generating a simple random sample Refer to Khan academy: Techniques for random sampling and avoiding bias

Methods of Random sampling:

「Simple Sampling」

Refer to Khan academy: Techniques for generating a simple random sample Refer to Wiki: Simple random sample

Example

image Solve: image

「Stratified Sampling」

Divide the population to couple of groups, and take samples from EACH group.

Refer to Wiki: Stratified sampling

image

「Clustered Sampling」

Divide the population to couple of groups, and randomly take a few GROUPS from them as samples.

Refer to Wiki: Cluster sampling

image

「Random Sampling」 vs. 「Random Assignment」

Refer to Khan academy: Random sampling vs. random assignment (scope of inference)

image

「Simple Random Sample」SRS

It means that the sample was selected in such a way where each member and set of members has an equal chance of being in the sample.

solomonxie commented 5 years ago

Quick note on: 「Non-random Sampling」 Bias

▶︎ Jump over to Khan academy for practice: Bias in samples and surveys Refer to Khan academy article: Identifying bias in samples and surveys

「Response Bias」

It occurs when people systematically give wrong answers.

「Nonresponse Bias」

It is when people chosen for the sample can't be contacted or refuse to answer.

「Convenience Bias」

Researcher chooses samples that are easiest to reach.

「Undercoverage」

It occurs when some members in the population are left out of the sampling frame.

「Voluntary Response Bias」

Researcher gives an open invitation and people decide to be in the sample or not.

「Wording Bias」

Misleading people by bias words or phrases.

solomonxie commented 5 years ago

Observational Study

WITHOUT affecting them, deeply observe whole (small) population. The key is to observe.

Refer to Khan academy: Worked example identifying observational study

Observational study DOES NOT tell the CAUSAL RELATIONSHIP, but only to tell you if one parameter has positive correlation with another parameter or not.

solomonxie commented 5 years ago

❖ Experiment Study

RANDOMLY divide samples to a Control Group and a Treatment Group, and compare 2 groups of which one is AFFECTED and another one NOT AFFECTED.

Refer to Khan academy: Introduction to experiment design Refer to EUPATI: Clinical trial designs

The purpose is to build a CAUSAL RELATIONSHIP, which might tell you one even can cause another event, which observational study can't tell.

The key is to divide two groups randomly, so that you will know how the affection really makes impact.

Two groups:

image

How to conduct a good Experiement

There're a few keys to conduct a good experiment:

「Placebo Effect」

Placebo means "fake medicine", which made by sugar. In drug testing and medical research, it's a very common way to test how mentality will affect the patient.

For conducting a medicine experiment, we randomly separate people to two groups:

「Blind Experiment」 & 「Double Blind Experiment」

It's a great way to avoid BIAS.

Improving 「Randomly Grouping」

Some times complete randomness will make things uneven, which raise the bias in experiment. etc., there're more women in one group and less in another, that affects much in the result; there're more young people in one group, that affects much as well.

So for helping to adjust this situation well, we want to introduce some improvement design for group strategy:

Randomized 「Block Design」

With a randomized block design, the experimenter divides subjects into subgroups called blocks, such that the variability within blocks is less than the variability between blocks. Then, subjects within each block are randomly assigned to treatment conditions. Compared to a completely randomized design, this design reduces variability within treatment conditions and potential confounding, producing a better estimate of treatment effects.

image

「Cross Over」 Design

It's simply just to "switch group", which after a period of time after the experiment to do the second experiment, that let the same people in Control Group switch to Experiment Group, and the other people switch as well.

Khan academy made the wrong video named "matched pairs design" which is actually "Crossover Design". Refer to Khan academy: Crossover Design ~(Matched pairs experiment design)~

image

「Matched Pairs」 Design

In the matched-pair design, participants are first matched in pairs according to certain characteristics. Then, each member of a pair is randomly assigned to one of the two different study subgroups. This allows comparison between similar study participants who undergo different study procedures.

image

「Replication」

"A very important idea, in science in general... Other people should be able to replicate and reinforce this experiment and hopefully get the consistent result" - Sal Khan

solomonxie commented 5 years ago

❖ Probability [DRAFT]

「Theoretical Probability」 vs. 「Experimental Probability」

The experimental probability should get closer and closer to the theoretical probability after trying more and more times.

image

solomonxie commented 5 years ago

Experimental Probability [DRAFT]

Random numbers for experimental probability

Statistical significance of experiment

The threshold: if the probability of an event is less than 5%, then it'll be called significant.

solomonxie commented 5 years ago

❖ Probability Rules

「Multiplication Rule」 A and B

The probability of multiple events occur at the same time is the multiplication of their probabilities.

image

「Addition Rule」A or B

The A or B probability is both of their favourable outcomes minus the OVERLAPS (common outcomes), which is (A + B - C). The formula is: P(A or B) = P(A) + P(B) - P(A and B).

Refer to Khan academy: Addition rule for probability

image

Example

image Solve: image

「Mutually Exclusive events」

Mutually exclusive events cannot happen at the same time.

Example

image

Example

image Solve:

「Compound probability」A then B

The problem "What's the probability of flipping coin to get 3 head in a row?" is a typical Compound probability of independent events problem.

The formula of Compound probability of independent events is the same with multiplication rule.

image

Example: Flipping a coin three times, what's the probability of getting a tail, head and tail ?

image

「At least one」 probability

image

solomonxie commented 5 years ago

Dependent Probability

Dependent probability means the result of second event will change because of what happened first.

Refer to Khan academy: Dependent probability introduction

Two events are INDEPENDENT to each other when:

image

Furthermore, with concept of Conditional Probability, two events are INDEPENDENT when:

image

Dependent Probability & Independent Probability

▶ Practice on Khan academy: Dependent and independent events

image

Example

image Solve:

Example

image Solve: image

solomonxie commented 5 years ago

❖ Conditional Probability

It's NOT just both events happened, it's asking the probability of one event AFTER another event happened. It's based on a happened event, that's why you're to divide the probability of the happened event.

Notation

Probability of B given A (B after A, or B in condition of A), or Probability of A given B: image

Formula

image

image

Understanding 「Conditional Probability」

Refer to youtube: Probability Part 2: Updating Your Beliefs with Bayes: Crash Course Statistics #14

Based on the concept of Set. It’s lot more intuitive to understand with a vann diagram. P(A | B) = P(A & B) / P(B) Circle of B is there for sure, proportion of A happen must be IN the circle of B, which is P(a & b). Divided by P(B) means, the proportion of A of B, means how much percentage of A space taken on B.

「The Parallel World」

Imagine there are many "parallel worlds", say A-world & B-world which are the worlds A & B occur. Of course they're parallel and happening at the same time, yet there could be chance of intersection, that A occurs in B's world, or B occurs in A's world.

And the chance of one event occurs in "another world" is the Conditional probability.

In the context of the Anime Steins;Gate, the conditional probability is chance of Mayuri being killed in Alpha-worldline.

Steins;Gate Worldline: image

「A GIVEN B」

It means "A after B", or "A after B has happened". Instead of happening at the same time P(A and B), the probability won't be the same if one has already happened.

Divide by B's probability

It shows how much the A and B covered the happened event B.

image

Example

image Solve:

solomonxie commented 5 years ago

❖ Bayes' Theorem (Basics)

The Bayes' Theorem is a revolution to conditional probability.

It does not intent to do once calculation, but is a progress of improving: each time gain a little bit more confidence.

Refer to youtube: Bayes' Theorem - The Simplest Case Refer to youtube: The Bayesian Trap

The formula of Bayes' Theorem is just a slightly extension to Conditional Probability.

image ▲ Probability of A given B and B given A has the same numerator, that being said, We can easily compute a conditional probability with its reversed event.

Understanding the formula

How does it make sense?

image

image

In real life, sometimes A given B is easy to get, sometimes B given A is easier to get. So whenever we encounter some difficulties of computing A given B, we can always use probability of B given A to compute.

image

Example: Spam emails

image

Example: Disjoint Union

Refer to youtube: Bayes' Theorem - Example: A disjoint union

Example: False Positives

Refer to youtube: Bayes' Theorem Example: Surprising False Positives

solomonxie commented 5 years ago

❖ Bayesian Statistics [DRAFT]

Refer to youtube: You know I’m all about that Bayes: Crash Course Statistics #24 Refer to youtube: Bayes in science and everyday life: Crash Course Statistics #25

image

image

img_4637 img_4638 img_4639 img_4641 img_4642 img_4643 img_4644

solomonxie commented 5 years ago

❖ What is 「Random Variable」

Instead of analyzing a measured distribution with explicit data, we're to abstract those analysis methods with uncertain data. It's like abstracting arithmetic to algebra.

Refer to Khan academy: Random variables ▶︎ Jump over to Khan academy for practice: Constructing probability distributions

Random Variables are just like the unknowns in algebra. Except it's slightly different in Statistics.

Remember that: Studying Random Variables is just like studying Algebra over Arithmetic.

More precisely, Random variables are neither random nor variables. (Try to google that)

Random Variables are denoted by capital letters: X, Y, Z

Example

image Solve: image

Example

image Solve: image

Types of 「Random Variable」

Refer to Khan academy: Discrete and continuous random variables

"Discrete" literally means "Distinct" or "Separate" values.

The most useful one for real life is the Discrete Distribution, and we're gonna talk about it mostly.

solomonxie commented 5 years ago

Probability Distribution [DRAFT]

Refer to Wiki: Probability Distribution

It takes each Random Variable's value as an input, to form a distribution. etc., values of a Discrete Random Variable can form a Discrete Distribution.

Types of Probability Distribution

A valid Probability Distribution

solomonxie commented 5 years ago

Discrete Random Variable [DRAFT]

Mean (Expected Value)

In the case of a discrete random variable, expected value or mean — denoted as E(X) or μx is the long-run average outcome. To find expected value, take each value, multiply it by its respective probability, and add up all the products.

image (Where the sum of all possible value of x)

Example

image Solve:

Variance (Deviation)

Variance (σ²): image

Standard Deviation (σ): image

Example

image Solve:

solomonxie commented 5 years ago

❖ Probabilities from 「Density Curves」

That’s why Discrete Distribution use histogram, and Continuous Distribution use density curve.

Probabilities over an 「Interval」

image

Example

image Solve:

Probability in 「Normal Distribution」

How to use Calculator for Probability in normal distribution

We could use any Graphic Calculator or online calculator, and input the Mean, Standard Deviation, Lower bound, and Upper bound.

▶︎ Online Normal Distribution Calculator.

etc., we know the mean = 70, SD=6, and asked to calculate the probability of value greater than 61. By input these values we'll get the anser: image

Example

image Solve:

solomonxie commented 5 years ago

❖ Operations of 「Random Variables」

Some basic "algebraic" operations, like adding/multiplying a number, or combining different R.V.s

「Shift」

The addition or subtraction of Random Variable X will have these effects:

「Scale」

The scale of Random Variable X will have these effects:

Example

image Solve:

「Combine」 Random Variables

Refer to wiki: Algebra of random variables Refer to article on Khan academy: Combining random variables

image

Important facts about combining variances:

Example

image Solve: image

Probability of 「Combined Normal Random Variables」

Remember: If both Random Variables are normally distributed, then the Difference of them will also be normally distributed.

Example

image Solve:

solomonxie commented 5 years ago
solomonxie commented 5 years ago

❖ 「Expected Value」 of a Random Variable

It's also known as the Expectation, Mathematical Expectation, EV, Average, Mean Value, Mean, or First Moment.

Refer to wiki: Expected value

"In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents."

image

Calculate 「Expected Value」

Example

image Solve:

Example

image Solve: image

Example

image Solve:

solomonxie commented 5 years ago

❖ 「Relative Frequency」 for Expected Value

▶︎ Practice at Khan academy: Making decisions with expected values

⦿ Mindset: It's better to count 1 by 1 rather than trying to apply formulas.

There're 2 ways to calculate the Expected Value:

Relative Frequency means How often something happens divided by all outcomes.

Depends on the case we're analyzing, we can choose either way to calculate expected value.

All the Relative Frequencies add up to 1.

image

「Absolute Frequency」

▶︎ Jump back to previous note on: Permutation & Combination

It also means the Relevant Outcomes, which is calculation of "n choose k" combinations.

「Sample Size」

▶︎ Jump back to previous note on: Intro to Probability

It literally means the Total Outcomes.

For a Flip Coin problem (Yes-No problem), the total outcomes is 2^trails, which means 2 * 2 * 2 ..... etc., the total outcomes of "flipping a coin 5 times" is 2⁵ = 32.

Calculate 「Probabilities for Expected Value」

▶︎ Practice at Khan academy: Expected value with calculated probabilities

Example

image Solve:

Example

image Solve:

Example

image Solve: image

Example

image Solve: image

Getting data from 「Expected Value」

Refer to Khan academy: Getting data from expected value ▶︎ Practice at Khan academy: Expected value with empirical probabilities

Example

image Solve: image

Example

image Solve: image

solomonxie commented 5 years ago

Bernoulli Distribution

It is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q=1-p, that is, the probability distribution of any single experiment that asks a yes–no question; the question results in a boolean-valued outcome, a single bit of information whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q.

Refer to wiki: Bernoulli Distribution Refer to Khan academy: Bernoulli distribution mean and variance formulas

image

「Mean & Variance」of Bernoulli Distribution

image

「Bernoulli Distribution」 vs. 「Binomial Distribution」

Refer to stackexchange: What is the difference and relationship between the binomial and Bernoulli distributions?

All Bernoulli distributions are binomial distributions, but most binomial distributions are not Bernoulli distributions.

image

solomonxie commented 5 years ago

❖ Binomial Random Variables

Binomial Distribution is one of the Discrete Distributions.

Binomial means "Two terms", that being said the Binomial Random Variable is a Random Variable contains TWO Parameters:

Refer to wiki: Binomial distribution Refer to article: Binomial Random Variables Refer to Khan academy: Binomial variables

Requirements

The requirements for a random experiment to be a binomial experiment are:

It can be simplified as: N, P, YES-NO, INDEPENDENT

Identifying 「Binomial Variables」

Examples of binomials

Examples of non-binomials

「Independence Assumption」 10% Rule

To identify a binomial random variable, we also need to prove its independence. In the case of large number of trails, we can't examine each trail but only to sample out a smaller number of trails.

With Replacement Sampling, since it's NOT really independent because when you take out a sample it will affect the rest samples. But the good thing is if the base number is large enough, then your replacement won't be a big deal to affect the result.

So that's the reason we introduced the 10% Rule, which means if the number of your samples are less than 10% of total, then we can assume each trail is independent. Because the portion is too small to affect all.

「Simple Random Sample」SRS

It means that the sample was selected in such a way where each member and set of members has an equal chance of being in the sample.

Replacement & Non-replacement

Sampling with replacement, means that every time you take out the sample, the total number will decrease, which affects the probability of rest samples. etc., there're 10 balls with different color, if you take out a red ball, then the probability of getting another red ball in the rest 9 balls will decrease.

Sampling with Non-replacement, means that each time you take out the sample, you put it back.

「10% Rule」

10% Rule is a rule to assume independence between trails.

If the number of your samples are less than 10% of total, then we can assume each trail is independent. Because the portion is too small to affect all.

image

solomonxie commented 5 years ago

Binomial Probability

Refer to article on Khan academy: Binomial probability (basic) Refer to Khan academy: Generalizing k scores in n attempts

▶︎ Online Binomial Probability Calculator

Formula of 「Binomial Probability」

image

image

We could simplify (verbal) it as:

P(X=r) = Combinations × P(yes) × P(no)

For the combinations, here's the formula: image

Or use the ▶︎ Online Combination Calculator.

Example: image

Example

image Solve:

「Mean & Variance」 of Binomial R.V.

Formula

Example

image Solve: image

Example

image Solve:

「Cumulative Binomial Probability」

Example

image Solve:

Example

image Solve: