solomonxie / blog-in-the-issues

A personalised tech-blog, notebook, diary, presentation and introduction.

https://solomonxie.github.io

67 stars 12 forks source link

Statistical Guessing 统计式瞎猜 #50

Open solomonxie opened 6 years ago

solomonxie commented 6 years ago

Statistics is all about PREDICTION: Given some real information, and predict what will happen next.

Study Resources

Tools

[ ] MIT Mathlets
- [ ] PROBABILITY DISTRIBUTIONS
- [ ] T DISTRIBUTION
- [ ] CONFIDENCE INTERVALS
- [ ] LINEAR REGRESSION
[ ] Online Stat Book (java required)
- [ ] Simulation Demos
- [x] Normal Distribution Simulation
- [x] Sampling Distribution Simulation
[x] Omni Stats Calculators
- [x] Normal Distribution Calculator
- [x] Combination Calculator
- [x] Permutation Calculator
- [ ] Binomial Distribution Calculator
- [ ] Geometric Distribution Calculator
- [ ] Confidence Interval Calculator
[x] SurfStat
- [x] T-distribution Calculator
- [x] Standard Normal Calculator
[ ] DI Management
- [x] Chi-square calculator
[ ] Maths if Fun
- [x] Chi-Square Calculator

Khan academy AP Statistics

[x] Categorical data
[x] Quantitative data (Display & Describe)
[x] Quantitative data (Summarize)
[x] Modeling Data Distributions
[x] Bivariate Numerical data
[x] Study design
[x] Probability
[x] Counting, permutations, and combinations
[x] Random Variables
[x] Sampling Distributions
[x] Confidence intervals
[x] Significance Tests (Hypothesis Testing)
[x] Inference different groups
[x] Chi-Square Tests for Categorical data
[x] Advanced Regression (inference and transforming)
[x] ◆ Course Challenge ◆

Machine Learning related topics

[ ] Bayesian Statistics
[ ] Random Variables
[ ] Logistic Regression (with Logistic model)
[ ] Linear Regression (with Gradient Descent)
[ ] Hypothesis Testings

solomonxie commented 6 years ago

Benford's Law

It's also called Newcomb-Benford's Law, Law of Anomalous Numbers, and First-Digit Law.

Refer to wiki: Benford's law

It is an observation about the frequency distribution of leading digits in many real-life sets of numerical data.

The first digits of data entries in most real-world data sets are not uniformly distributed. The most common first digit is 1, followed by 2, and so on, with 9 being the least common first digit. This phenomenon is known as Benford's Law.

The leading digits in such a set thus have the following distribution:

solomonxie commented 6 years ago

Two-way Tables (Joint Distributions)

Refer to Khan academy: Two-way tables Refer to Khan academy: Distributions in two-way tables Refer to Khan academy: Marginal distribution and conditional distribution

Refer to Mathbitsnotebook: Two-Way Frequency Tables

Definitions

`Two-way Table`

Two-way Table is a Joint distribution, which rows represent a kind of distribution, columns represent another kind of distribution.

`Marginal Distribution`

Marginal Distribution is simply an addon to the joint distribution, that as a TOTAL row or column at the margins.

`Conditional Distribution`

Conditional Distribution is one column(variable) in condition of another variable.

`Trends in categorical data`

Refer to Khan academy: Analyzing trends in categorical data Refer to Khan academy: Filling out frequency table for independent events

▶ Practice on Khan academy: Trends in categorical data

Interpret the table:

Row %: shows how much proportion of the cell is on the Row Total. etc., the cell Pond-Maple is 59.46% of all samples by pond.
Column %: shows how much proportion of the cell is on the Column Total. etc., the cell Pond-Maple is 48.89% of all maples samples.
Total %: shows how much proportion of the cell is on the Sample Total. etc., the cell Pond-Maple is 27.5% of all samples.

Example

Solve:

Get the total number of people:
Get the number of people from California: 500 * 0.5 = 250
Analyze association. The logic is: The event A & B has association if A takes a big part in B, EVEN IF B only takes a SMALL part in total samples.

Example

Solve:

The answer is not precise but we're to guess it precisely.
Been told those are entirely independent events, so we know that: The probabilities are P(makes 1st shot) = P(makes 2nd shot), and P(misses 1st shot) = P(misses 2nd shot), regardless whether he makes or misses the 1st shot.
We could get the "fixed" probability from the marginal information:
And we apply the probability 80% to all "makes shot" cell to get:

Example

Solve:

solomonxie commented 6 years ago

Frequency Table & Dot plot

Refer to Khan academy: Frequency tables & dot plots Refer to Khan academy review: Dot plots and frequency tables review

solomonxie commented 6 years ago

❖ Central Tendencies: Mean, Median, Mode

Which could represent the centres of a distribution.

Refer to youtube: Mean, Median, and Mode: Measures of Central Tendency: Crash Course Statistics #3 Refer to wikipedia: Central tendency Refer to Khan lecture.

Mean is just an average of all numbers listed.
Median is the middle positioned number in a ordered number set (means no duplicates). If there're two middles, then average them to get a median number.
Mode is the number shows up most times in a list.

Impact on median & mean

There're some common impact:

Increasing an outlier:
Removing an outlier:

`Average`

Average in statistics means bit different than just a arithmetic average.

Khan lecture.

Average: In stats, it means typical or middle, and could be represented by multiple ways:

Arithmetic mean: Sum numbers and get average.
Median: Sort numbers and get the MIDDLE one.
Mode: A number repeats the most times in a dataset.

solomonxie commented 6 years ago

❖ Quartiles and Box plots (Distribution graph)

It's also called Box and whisker plots, or Five-number summary.

▶︎ Jump over to Khan academy for practice: Comparing data distributions

Refer to Khan academy: Reading box plots Refer to Khan academy: Interpreting box plots Refer to Maths is for fun: Quartiles

Quartiles are the values that divide a list of numbers into quarters:

Put the list of numbers in order
Then cut the list into 4 equal parts
The Quartiles are at the "cuts"

「Interquartile range」IQR (Box plot)

Refer to Khan academy.

The Interquartile Range is from Q1 to Q3:

Example

「Five-number Summary」

Refer to Khan academy: Five-number summary

Example

「Box and Whisker Plot」

Box and Whisker Plot can show all the important values.

Important values:

Median: at Q2
Interquartile Range: Q3-Q1
Highest
Lowest
Shape: Skewed Left or Right if Q2 IS NOT in the middle of the Interquartile range.
Mean: It DOES NOT show mean in the Box Plot unless it's a Normal Distribution which mean=median.

Find out the 「Mean」 in Box plot

Although we can't find out the mean value from the Box Plot. But according to the position of the Q2 (the Median), we could know the relationship between the Mean & Median:

If Q2 is at the middle in the Interquartile, it's probable a Normal Distribution, which Median = Mean
If Q2 is at the LEFT in the Interquartile, it's probable a Right Skewed Distribution, which Median < Mean
If Q2 is at the RIGHT in the Interquartile, it's probable a Left Skewed Distribution, which Median > Mean

Example

At this graph below, according to the Q2 position, we know that the distribution shape is Skewed right

Practice

solomonxie commented 6 years ago

「Variance」 Deviation

In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean.

Larger deviation means the distribution spread wider.
Smaller deviation means the distribution stick closer.

「Standard Deviation」

Also called Standard Variance.

Refer to Khan academy review: Calculating standard deviation step by step

(▲ where ∑ means "sum of", x is each value in the data set, μ(mu) is the mean of the data set, and N is the amount of data points in the population.)

Steps:

Find the mean.
Find the distance from each value to the mean.
Find the average of all squared distances.
Find the root of the average.

Example

Solve:

By eyeballing, for Freshmen most of the data are not clustered around the mean.
But for Seniors, most of the data points are clustered around the centre.
So Freshmen has greater Standard Deviation.
The seniors' data resembles a Normal Distribution, in which the Standard Deviation is a good measure for the spread.

solomonxie commented 6 years ago

Sample Variance

The Sample Variance, s², is used to calculate how varied a sample is, and it's useful to estimate the Population Variance.

Since the Sample Variance is kind of estimation, so its formula is bit different.

Why do we need to divide by `n-1`?

Refer to Quora: Why is the formula of sample variance different from population variance?

"The sample variance is an estimator for the population variance. When applied to sample data, the population variance formula is a biased estimator of the population variance: it tends to UNDERESTIMATE the amount of variability. "

For solving this Underestimation problem, the statisticians found out that by dividing n-1 we will solve this problem, regards to the idea of degrees of freedom (DF).

Easy way to calculate Sample Variance

This formula is better for handwriting calculation:

`Sample Standard Deviation`

Example

Solve:

The age of any gorilla in our sample is likely to be closer to the average of the 4 gorillas we looked at instead of the average of all the gorillas in the zoo. Because of that, the squared deviations from the mean we calculated will probably underestimate the actual deviations from the population mean. To compensate for this underestimation, rather than simply averaging the squared deviations from the mean, we total them and divide by n-1.

solomonxie commented 6 years ago

「Mean Absolute Deviation」 (MAD)

The Mean absolute deviation is the absolute average of all deviations.

The deviation is the distance from the value to the mean value. It's used to describe how the values looks like or how they're laid on the axis, are they close to each other or far away.

Khan lectures.

solomonxie commented 6 years ago

❖ Intro to Probability [DRAFT]

It's easy but always confusing if you haven't yet totally understood it in the first place.

The very first thing to do for solving a probability problem, is to CATEGORISE the problem and apply different formula.

Single Event
Single Event Repeats
Independent Events in Sequence

「Single Event」

The probability of an event can only be 0 to 1 (or 0% to 100%).
The probability of event A is often written as P(A).
The probability of an event with condition is often written as P(condition).

Common cases:

Flip a coin: P(head) = 1/2.
Roll a die: P(>3) = 3/6

「Theoretical Probabilities」 & 「Experimental Probabilities」

The formula Fav outcomes / Total outcomes only gives you the Theoretical probability. But when you do some experiments, like flip a coin 10,000 times, and you may find out the probability of the result of experiments is way so different than the theoretical one.

「Single Event Repeats」

Example: Roll a die 100 times, how many times will you get a number greater than 3? Answer: P(>3) = 3/6 *100 The probability is 50 times.

「Multiple Events」

「Independent events」 in sequence

「Independency」

To understand probability, we really need to differentiate independent events and dependent events.

Khan lecture: Compound probability of independent events.

Coin flips are INDEPENDENT events: What happens in the first flip in no way affects what happens in the second flip.

And this is actually one thing that many people don't realise.

「Gambler's Fallacy」

There's someone who thinks, if he got a bunch of heads in a row, then all of a sudden, it becomes more likely on the next flip to get a tails.

THAT IS NOT THE CASE.

Every flip is an independent event. What happened in the past in these flips does not affect the probabilities going forward.

「Sample Space」

A dummy method, just to draw a table or a tree shows every outcome it could be, and pick out all favourable results.

Refer to Wiki: Sample Space Refer to article: Sample Space Examples and The Counting Principle

The sample space of an experiment is all the possible outcomes for that experiment.

(Rolling Two dice)

(52 card deck)

「Sample Size」

It's also called the Size of Sample Space.

Simply to MULTIPLY.

The Fundamental Counting Principle: If there are a ways for one event to happen, and b ways for a second event to happen, then there are a * b ways for both events to happen.

Sample problem: If shoes come in 6 styles with 3 possible colors, how many varieties of shoes are there? All you need to do is multiply: 6 • 3 = 18 possible varieties of shoes.

Example:

First to notice that, it's ONE event.

Example:

solomonxie commented 6 years ago

❖ 「Permutations」 & 「Combinations」

Aside from probability, Permutations and Combinations are essential tools for statistics.

They're to solve the problem: how many groups are there of if we choose some from some.

▶︎ Back to previous note on: Intro to probability.

▶︎ Omni Permutation Calculator ▶︎ Omni Combination Calculator

Refer to article: Easy Permutations and Combinations Refer to article: Permutations And Combinations Simplified Refer to article: Combinations vs Permutations

HOW MANY groups do we get if we choose a number things from the total things? e.g., how many groups would there be if we choose 3 people from 9 people?

Permutations and combinations are both to count the total number of groups. We got TWO types of ways to count:

Permutation: Order matters.
Combination: Order DOES NOT matter.

Combinations could be seen as FILTERED permutations, which filtered out all the "duplicates", or "over counted items".

e.g., We got different groups(Permutations) as "123, 132, 231, 213, 312, 321", once we filter out the over counted items, the combination is just one: 123.

「Permutations」

It's all the possible ways to arrange/order elements in a list.

Notation

(Read as N pick K)

Understanding 「Permutations」

Notice: possibilities ≠ probabilities

e.g., the possibilities of how to arrange three numbers 1,2,3? It could be: 123, 132, 231, 213, 312, 321, so answer is 6 possible ways. To count that algebraically, it'd be 3*2*1, answer is 6 possible ways.

How do we do this?

Possible ways to fit in the 1st position are 3, and we got 2 left overs. Then the 2nd place could have 2 possible ways, and we got 1 left over. So the 3rd position could be 1 possible way.

And just to logically think about it, we should MULTIPLY them together to get ALL POSSIBLE WAYS: 3*2*1.

Formula

Full Permutations: (5 people sitting in 5 chairs)
Pick Permutations (3 people sitting in 5 chairs):

「Combinations」

Combination is a collection of elements which the order DOESN'T matter.

Based on permutations, we filter out the same combinations by dividing k! to get the combinations.

Notation

(Read as N choose R)

Formula

solomonxie commented 6 years ago

❖ 「Set」 Basics

Refer to wiki: Set Refer to Khan academy: Basic set operations

「Membership」

If B is a set and x is one of the objects of B, this is denoted x ∈ B, and is read as "x belongs to B", or "x is an element of B". If y is not a member of B then this is written as y ∉ B, and is read as "y does not belong to B".

「Subsets」

If every member of set A is also a member of set B, then A is said to be a subset of B, written A ⊆ B (also pronounced A is contained in B). Equivalently, we can write B ⊇ A, read as B is a superset of A, B includes A, or B contains A.

「Empty Set」 ∅

The empty set is a subset of every set and every set is a subset of itself:

∅ ⊆ A.
A ⊆ A.

「Universal Set」 U

Every set is a subset of the universal set: A ⊆ U.

`Basic Set Operations`

「Intersection」 ⋂, &, and

Examples:

{1, 2} ∩ {1, 2} = {1, 2}.
{1, 2} ∩ {2, 3} = {2}.

Basic properties of intersections:

A ∩ B = B ∩ A.
A ∩ (B ∩ C) = (A ∩ B) ∩ C.
A ∩ B ⊆ A.
A ∩ A = A.
A ∩ U = A.
A ∩ ∅ = ∅.
A ⊆ B if and only if A ∩ B = A.

「Union」 ⋃, |, or

Examples:

{1, 2} ∪ {1, 2} = {1, 2}.
{1, 2} ∪ {2, 3} = {1, 2, 3}.
{1, 2, 3} ∪ {3, 4, 5} = {1, 2, 3, 4, 5}

Basic properties of unions: A ∪ B = B ∪ A. A ∪ (B ∪ C) = (A ∪ B) ∪ C. A ⊆ (A ∪ B). A ∪ A = A. A ∪ U = U. A ∪ ∅ = A. A ⊆ B if and only if A ∪ B = B.

「Complements」 \, -, subtract

Two sets can also be "subtracted". The relative complement of B in A (also called the set-theoretic difference of A and B), denoted by A \ B (or A − B), is the set of all elements that are members of A but not members of B.

Examples:

{1, 2} \ {1, 2} = ∅.
{1, 2, 3, 4} \ {1, 3} = {2, 4}.
If U is the set of integers, E is the set of even integers, and O is the set of odd integers, then U \ E = E′ = O.

Basic properties of complements:

A \ B ≠ B \ A for A ≠ B.
A ∪ A′ = U.
A ∩ A′ = ∅.
(A′)′ = A.
∅ \ A = ∅.
A \ ∅ = A.
A \ A = ∅.
A \ U = ∅.
A \ A′ = A and A′ \ A = A′.
U′ = ∅ and ∅′ = U.
A \ B = A ∩ B′.
if A ⊆ B then A \ B = ∅.

Example

Solve:

solomonxie commented 6 years ago

Histogram

Refer to Khan academy: Creating a histogram

Instead of plotting dots, Histogram put data of categories into BUCKETs.

「Relative Frequency Histogram」

Instead of pointing out each category's absolute value, sometime we need it better with each category's percentage, which Relative Frequency will solve the problem.

solomonxie commented 6 years ago

Stem & Leaf Plot

Refer to Khan academy review: Stem and leaf plots review

Both Stem and Leaf columns represents the digits (or the place) of numbers.

In the case below, stem shows the tenth place digit, and leaf shows the ones place digit.

solomonxie commented 6 years ago

❖ Describing Distributions

Refer to Khan academy: Example: Describing a distribution

「Shapes」: Normal, Left Skewed, Right Skewed

Refer to Khan academy: Classifying shapes of distributions

Normal Distribution (Symmetric distribution)
Left Skewed Distribution
Right Skewed Distribution
Uniform
Bimodal Distribution

Example

「Spread」: Range, IQR, Standard Deviation, MAD

Refer to Crash course: Measures of Spread: Crash Course Statistics #4

Range: (Highest value - Lowest value)
IQR: (Q3-Q1)
Standard Deviation: σ (sigma)
Mean absolute deviation (MAD)

「Centres」: Mean, Median, Mode

Refer to Crash course: Mean, Median, and Mode: Measures of Central Tendency: Crash Course Statistics #3

Mean is just an average of all numbers listed.
Median is the middle positioned number in a ordered number set (means no duplicates). If there're two middles, then average them to get a median number.
Mode is the number shows up most times in a list.

「Outliers」

Refer to Khan academy: Judging outliers in a dataset

In statistics, an outlier is an observation point that is distant from other observations. That being said, outliers in a graph are the MINORITY of the values.

Statistical definition 「1.5·IQR Rule」

Outliers are the value fall out of the Fence, which the Upper fence and Lower fence are:

How to choose proper methods

We got different ways to describe the spread, centre and deviation, so we need some strategy to decide which one to use in different cases.

For Normal Distribution: we use Mean as centre, Standard Variance as spread
For Skewed Distribution: we use Median as centre, IQR as spread

solomonxie commented 6 years ago

❖ 「Clusters」, 「Outliers」, 「Gaps」, 「Peaks」

Khan lecture: Shape for distributions. Khan lecture 2 Clusters, gaps, peaks & outliers.

Cluster: A group of values sticks together away from other groups.
Outliers: Some Minority values much away from the crowd (Majority).
Peaks: Highest value in the distribution.
Gaps: The ''large'' open space between some data points.

solomonxie commented 6 years ago

Sample Variance

It's also called the Unbiased estimate of population variance.

Refer to Khan academy: Sample variance

For a large population, it's impossible to get all data. So we want to take out a number samples and calculate its variance.

The formula for Sample Variance is a bit twist to the population variance: let the dividing number subtract by 1, so that the variance will be slightly bigger.

It seems like some voodoo, but it's reasonable. If we use the population variance formula for sample data, it's always gonna be underestimated. That's why for sample variance we should do a bit change to the previous one.

Why we divide by n-1 for the Unbiased Sample Variance

Refer to Khan academy: Review and intuition why we divide by n-1 for the unbiased sample variance Refer to Khan academy: Why we divide by n-1 in variance Refer to Khan academy: Simulation showing bias in sample variance Refer to Khan academy simulation: Unbiased Estimate of Population Variance

Simulation for different variance formulas with true variance:

solomonxie commented 6 years ago

❖ Percentiles

Before start you probably need to know: explanations of percentiles are quite confusing and different from each teacher teaches and different at each website you searched. Because there is NO standard definition of percentile.

Percentiles tell you what PERCENTAGE of the population has a value that's LOWER than yours.

▶︎ Jump over to have practice: Calculating percentiles

Refer to Khan academy: Calculating percentile Refer to youtube: Percentiles - Introductory Statistics Refer to youtube: Percentile Refer to textbook [PDF]: PERCENTILES AND PERCENTILE RANKS Refer to wikipedia: Percentile Refer to wikipedia: Percentile rank Refer to mathisfun: Percentiles Refer to pbarrett: percentiles (PDF) Refer to varsity tutors: percentiles

A percentile is all values BELOW the given percentage. etc., the 20th percentile is all values below which 20% of the observations may be found.

Percentiles are numbers from 1st to 100th, which 100th percentile means the largest value in the set. According to wiki, there COULD be decimal percentiles such as 0.13th percentile, 2.28th percentile.

For example, if your doctor tells you: your height is AT the 83% percentile of population, it means there's 83% of people are shorter than or equal to yours:

Other names of Percentiles

Interquartile:

25ᵗʰ percentile = Q1
50ᵗʰ percentile = Q2 = Median
75ᵗʰ percentile = Q3

Deciles: Deciles are percentiles divided into 10 equal sections, which correspond to the 10th, 20th, 30th,...90th percentiles.

「Percentile Rank」

Percentile rank is usually in a context of asking you to find a given value is at which percentile. i.e., Percentile ranks are commonly used to clarify the interpretation of scores on standardized tests.

etc., you're asked what is the percentile rank of number 79 in a list, and the answer might be "Its rank is 90, because it's at the 90th percentile."

「Percentile」 vs. 「Percentile Rank」

Percentiles and Percentile Ranks are highly similar(confusing) statistics.

Percentiles are used to determine where to draw the line between observed values within the distribution. (etc., a teacher wants to divide his class in half according to students' scores. And he needs to find out which score could be AT 50th percentile so that he can divide them.)
Percentile rank is kind of reversed process: It is used to determine where a particular score or value fits within a broader distribution. (etc., A student receives a score of 75 out of 100 on an exam and wishes to determine which percentile he is at compares to the rest of the class. )

Example

「Calculate Percentiles」

The process of calculating percentiles, is actually manipulating the indexes of the number list. It's like calculating the pointer, finding out the right pointer will lead you to the number, regardless to what number it is.

There're a few methods for calculating percentiles:

Interquartile method: For 25th = Q1, 50th = Q2 (or Median), 75th = Q3.
The nearest-rank method: The most often used method.
The linear interpolation between closest ranks method
The weighted percentile method

Formula

(Index is the value at given percentile, which , P is the percentile, Amount is the number of values in the list) For cut down confusion, we use index instead of Rank from textbooks, which regards to the "ordinal rank" not "percentile rank".

Example

There's a 12 numbers list, {a,b,c,d,e,f,g,h,i,j,k,l} then 80th percentile relates to 80% of the AMOUNT of the list, then it's 80% × 12 = 9.6 , which 9.6 is the index of the number in list. But the index must be a whole number, so according to the definition of percentile, the number must be equal or above 80% of all values, that's being said, the index of number is higher than "9.6", which is the 10th number in list. So the 10th number in list is AT the 80th percentile, regardless what number it is.

Example

Consider the ordered list {15, 20, 35, 40, 50}, which contains 5 data values. What are the 5th, 30th, 40th and 100th percentiles of this list using the nearest-rank method? Refer to wiki: Worked examples of the nearest-rank method

Solve:

The 5th percentile:
- Find the index of number: 5% * 5 = 0.25, which means 5% of five numbers are below a number which index is 0.25.
- But we don't have number at 0.25th index, so the nearest index ABOVE 0.25 is 1
- So the first number 15 is our answer in the case.
The 30th percentile:
- Find the index of number: 30% * 5 = 1.5, which means 30% of five numbers are below a number whose index is 1.5.
- But we don't have a number at 1.5th index, so the nearest index ABOVE 1.5 is 2
- So the second number 20 is our answer in the case.
The 40th percentile:
- Find the index of number: 40% * 5 = 2, which means 5% of five numbers are below a number whose index is 2.
- So the second number 20 is our answer in the case.
The 100th percentile: is the LAST number in the list, which is 50.

Example

Calculate 「Percentile ranks」

We use the same formula from calculating percentiles:

Instead of input the percentile to get the index, we are to input the index and get the percentile rank.

Example

If the scores of a set of students in a math test are {20 , 30 , 15, 75}. What is the percentile rank of the score 30 ?

Solve:

Reorder the dataset: `{15, 20, 30, 75}
We know 30 is the second number, so its index is 3.
Let's input the index and the amount of scores into the formula:

So the Percentile rank for number 30 is 75, which means it's at 75th percentile.

「Cumulative Relative Frequency Graph」

Refer to Khan academy: Analyzing a cumulative relative frequency graph

solomonxie commented 6 years ago

❖ 「Z-score」 (Standard Score / Normal Score)

Z stands for Standard Normal Distribution. It's fairly important in real life: Japan use Z-score on exam to estimate each student's study skills.

Z-score is the essential concept of Z-Statistics.

▶︎ Jump over to have practice: Comparing with z-scores

Refer to Wiki: Standard score Refer to Khan academy: Z-score introduction Refer to youtube: Why Do We Need z Scores Refer to youtube: Statistics 101: Understanding Z-scores Refer to Crash Course: Z-Scores and Percentiles: Crash Course Statistics #18 Refer to youtube: z-score Calculations & Percentiles in a Normal Distribution

Z-score is all about comparison: compare different kind of data set. In another word, Z-score indicates How many standard deviations away (above or below) from the mean to the given point.

Why do we need Z-scores

"Z-scores in general allow us to compare things that are NOT in the same scale, as long as they are NORMALLY distributed." - CrashCourse

For example, although we know everyone's score, but by only watching those scores it's hard to know how good he is or how bad he is compare to anyone else in the dataset. etc., if most of the students score above 90, can we say someone scores 90 is good?

So Z-score gives a solution for this: compare the score to the "average".

Z-score is especially good to compare different type of data, etc., compare 100-score exam & 150-score exam, compare IELTS & TOFEL, compare apples & oranges, compare a baseball player & football player....

All in all, Z-score is a process of Normalization, which "normalize" different set of data to same standard and compare.

Compares the various grading methods in a normal distribution:

How to understand the formula?

With comparing each one's score with the mean: x - μ, we will get a kind of deviation.

But at this point we still don't know whether each one's deviation is big or small. We need a "standard" to compare each deviation. Just like the mean is the average of all scores, standard deviation is the average amount of deviation of all scores, which will tell us each deviation is large or not. So we want to compare each deviation with the Standard deviation: deviation ÷ 𝜎

And we get the whole picture: Standard Score = (𝓍 - μ) / 𝜎

How to understand the Number of Standard Deviations?

Assume the standard deviation is 𝜎(sigma), so the number of it just means how much it is scaled. etc., 2𝜎 means a doubled standard deviation, and 1.5𝜎 means 1.5 times larger SD. If your Z-score is 2𝜎, it means your score is doubled standard deviation away from the mean.

Example

There's some exam data of a class:

Here's their z-scores:

Example

Solve:

Isabella's z-score is: (20-22)/5 = -0.4
Hannah's z-score is: (33-38)/12.5 = -0.4
So they're equally young in their degree level.

「Z-table」 Convert Z-score to Percentile

This ONLY applies to Normal Distribution

Refer to Khan academy: Standard normal table for proportion below

If you know someone's z-score, you will easily get his percentile from the Z-table. Vice versa, if you know his percentile, you can get his z-score as well.

How to use? The 1st Row represents the tenth decimal of the z-score, the 1st Column represents the hundredth decimal of the z-score. According to the given z-score, and search over the rows & columns to get the corresponded intersection, which is the percentile.

etc., Someone's z-sore is "0.57", and you want to know what percentile he's at, or what proportion is below his score. Just go over to the z-table, first get to the row at 0.5, and find the column of 0.7, and the intersection will be his percentile, which is "0.7157" or "71.57%" in this case.

Common values:

Explicit Z-table:

Example

Solve:

Get the z-score of student Faisal: (103.1-105)/10 = -0.19.
Refer to the Z-table we'll get the corresponding percentile rank: 0.4247.
The answer is 0.4247 (42.47%) of students are shorter than Faisal.

Example

Solve:

Get two z-scores: (82-83.2)/8 = -0.15, (89.2-83.2)/8 = 0.75
Get both points' corresponding percentiles: 0.4404 & 0.7734
Cut out the "overlays": 0.7734 - 0.4404 = 0.333
So the answer is 0.333 or 33.3%.

「Z-table」Convert Percentile to Z-score

Refer to Khan academy: Finding z-score for a percentile

Just do the other way around by looking for the given percentile cell and then read out the corresponded column & row, that will get you the z-score.

Example

Solve:

"Top 5%" means the minimum percentile rank is at 95, which is 0.95 in percentage.
Find out the corresponding z-score according to the percentile:
- There's no "0.95" in z-table but "0.9495" & "0.9505"
- Since the "minimum percentile` is 0.95, so "0.9505" is the one
- "0.9505" corresponds to the z-score "1.65"
Take the z-score back to z-score formula: 1.65 = (x-66000)/21000
Get the x=100650 which is the minimum annual profit.

solomonxie commented 6 years ago

❖ Population Parameters [DRAFT]

「Population」

In statistics, the Population is the collection of all people, items, or objects that are required for a specific study.

「Parameter」

It's also called the Population parameter.

The word parameter in Statistics means different than in Mathematics. It is the number that describes the population. It is obtained from a statistic which is calculated from a randomly selected sample of the given population.

Common population parameters:

「parameter」 vs. 「statistic」

The word parameter often refers to the Population statistic, etc., population mean, population SD.

The word statistic although generally refers to a fact about the data, but it also often refers to the Sample statistic, etc., sample mean, sample proportion.

「Central Tendencies」

(To be written...)

How parameters change as data is shifted and scaled

Refer to Khan academy: How parameters change as data is shifted and scaled

We see that:

Adding a number to each value: The Central tendencies (Mean/Median) will INCREASE the same amount with each value. And Spread (Standard deviation/IQR) will NOT change.
Multiply a number to each value: Both Central tendencies and Spread will scale up the same amount of each value.

Example

Solve:

The mean will be affected by both shifting and scaling, so the new mean is 5/9 * (104-32) = 40
The standard deviation will only be affected by scaling, so it will then be: 5/9 * 2 = 1.11

solomonxie commented 6 years ago

Density Curves

Some times histograms aren't good enough to visualize large amount of dataset. And Density Curve plot will solve the problem, as it can take on any value in a continuum, they're not just thrown into some buckets.

Axes (Tricky):

X-axis represents the values of data points
Y-axis represents the proportion of certain interval, which is up to 1 (or 100%).

Area

The entire AREA under the curve is 100%, which represent all the data points.

The percentage of a interval of data points, is the AREA under the curve over the interval. NOT the height of a point.

Parameters of Density Curves

Median

For Symmetric distribution, the Median is right at the middle, which is at 50th percentile.

For Skewed distribution, the [Left side area] = [Right side area] = 50%.

Mean

For Symmetric distribution, the Mean is right at the middle:

For Skewed distribution, the Mean is at the right or left of the Median:

Example

What is the height of median?

Solve:

The median is at 50th percentile, which both left & right area are 50%.
But we CAN'T know exactly where the median is.
By eyeballing it, the point divide the shape to TWO EQUAL areas is around 5.5,
But without further information we can't tell the height.

Example

Solve:

The whole area under the curve is 100%
The base is 4, so:

Example

Solve:

The area under the density curve when x>3 is the whole area
Hence the area is 100%

Example

Solve:

Area over interval [3,5] is a triangle.
Apply the area formula: 1/2 * (5-3)*0.6 = 0.6 = 60%

solomonxie commented 6 years ago

Empirical Rule 「68-95-99.7 Rule」

This rule ONLY applies to Normal Distribution.

It's also called the 68-95-99.7% rule, because for a normal distribution:

≈68% of the data falls within 1 standard deviation of the mean
≈95% of the data falls within 2 standard deviations of the mean
≈99.7% of the data falls within 3 standard deviations of the mean

Example

Solve:

According to the z-score of point 32.2, (32.2-20.5)/3.9=3 which is 3 standard deviations away from the mean
So by looking at the empirical rule graph, we get the percentage of 3𝜎 away from mean.
Hence the percentage above that value is 0.15%

Example

Solve:

According to the z-score formula, we get the two points' z-score are: -1 & 2
By looking at empirical rule graph, the -1𝜎 & 2𝜎 represents 16th percentile & 97.5th percentile.
So subtract the overlays and we'll get 97.5% - 16% = 81.5%

solomonxie commented 6 years ago

Scatter Plot

Just to plot many dots on the X-Y plane.

Linear & Non-linear Relationship

If you can fit a LINE through those points, it's linear relationship. If not, then it's non-linear.

Positive & Negative Relationship

If the scatterplot has a linear relationship:

If the slope is positive, then it's positive relationship
If the slope is negative, then it's negative relationship

「Bivariate relationship」Linearity, Strength and Direction

Bivariate is just a fancy way to say: For analyzing each point in X-Y plane, we analyze x & y SEPERATELY. etc., at point (2,3), including x-position is 2 and y-position 3, we analyze the x-values of all data-points, and then y-values of all data-points.

Refer to Khan academy: Bivariate relationship linearity, strength and direction

「Correlation Coefficient」

The Correlationis the SLOPE, and the coefficient of it is kind of adjustment to describe how well the slope fits the data. It's also kind of like a "Unit SLOPE" of the estimated Regression Line.

Refer to youtube: The Correlation Coefficient - Explained in Three Steps Refer to Khan academy: Correlation coefficient intuition Refer to Khan academy: Calculating correlation coefficient r

Correlation Coefficient is represented as letter r. The interval of r is -1 ≤ r ≤ 1:

r=1 when the line fits ALL data points. The better the line fits the data points, the r is closer to 1 or -1.
r=0 when there's NO correlation or linear relationship. The "worse" the line fits data, the r is closer is closer 0.

Find out the Correlation Coefficient

Formula:

solomonxie commented 6 years ago

❖ Least-square Regression

Least-square Regression is one way of calculating Linear Regression. Most regressions' calculations are done by computer, but we want to do that by hand to have better understanding.

What is Linear Regression? Trying to fit a line as closely as possible, and as many of points as possible, is called "Linear Regression".

Refer to Khan academy: Introduction to residuals and least-squares regression

「Residuals」

Residuals are errors. More specifically, they are the differences between the actual value of the response variable and the value predicted by the least squares regression line.

At a certain X-position, the value of residual is the VERTICAL DISTANCE from the actual value to the Regression Line.

When the residual is positive, the actual point is ABOVE the regression line,
When the residual is negative, the actual point is BELOW the regression line.

The way that we calculate the Regression Line with Least Square method, is to MINIMIZE the square of residuals.

Example

Solve:

This dish's actual taste rating was 4 points higher than predicted based on its appearance

Example

Solve:

Recognize the VARIABLES: Y -> mass, X -> breadth
So the expected mass is -47 + 2*40 = 33
Since the observed mass is 29,
So residual = observed - expected = 29 - 33 = -4

Calculate the equation of 「Least-square line」

▶ Practice at Khan academy: Calculating the equation of the least-squares line

Refer to Khan academy: Calculating the equation of a regression line

Formula of Regression line:

As we said the Correlation Coefficient r is kind like the Unit Slope which is between -1 to 1, so we have to apply the unit slope in real case by multiply r with the ratio of Standard Deviation of y & x, which is Sy/Sx.
A "must go through point" is the MEAN of the dataset, which is: (Ẋ, Ẏ). At the mean, the residual = actual

With two informations above, we can easily calculate out the estimated Regression Line.

「Slope」 of Regression line

「Intercept」 of Regression line

Example

Solve:

solomonxie commented 6 years ago

Residual Plot

Refer to Khan academy: Residual plots

Linear model Residual plot:

Non-linear model Residual plot:

solomonxie commented 6 years ago

❖ 「R²」 Coefficient of Determinator

R-squared means Squared Residuals, which is the SE (Standard Error).

R squared is ALWAYS between 0 and 1, and the higher your R squared, the better.

Refer to Khan academy: R-squared or coefficient of determination

R squared is the variation of y that is explained by your linear model.

R-squared = Explained variation / Total variation

Formula

(SE_line is Standard Error from line)

If SE (Standard Error) from the line is small -> r² close to 1 -> The line is a good fit.
If SE (Standard Error) from the line is large -> r² close to 0 -> The line is not a good fit

Understanding 「R-squared」

Refer to youtube: 3.2: Linear Regression with Ordinary Least Squares Part 1 - Intelligence and Learning

Why do we square "Residuals"?

It's just a way to keep those residuals (difference from the regression line) positive. And actually the residuals or squared residuals DOESN'T really matter to us, because we're to MINIMIZE them anyway. Take the minimum residual or minimum residual squared doesn't matter.

Why do we square 「Correlation Coefficient」?

(To do...)

Why do we add them together

By adding them we will get the TOTAL ERRORS, which is the one we're going to minimize.

「Root Mean Square Error」RMSE

It's also called the Root Mean Square Deviation (RMSD), or Standard Deviation of the Residuals.

This method is to measure the how good the Regression Line fits the data.

Refer to Khan academy: Standard deviation of residuals or root mean square deviation (RMSD)

solomonxie commented 6 years ago

Output of Least-square Regression

Refer to Khan academy: Using least squares regression output Refer to Khan academy: Confidence interval for the slope of a regression line

Prerequisite:

Sampling Distribution
Confidence Interval
Hypothesis Test

solomonxie commented 6 years ago

❖ Study design (Stats)

For different purposes, we're to use different methods of study.

Refer to Khan academy: Types of statistical studies

Types of Statistical Studies:

Sample Study: Sample out a portion of a LARGE POPULATION for studying on them.
Observational Study: WITHOUT affecting them, deeply observe whole (small) population.
Experiments: RANDOMLY divide samples to a Control Group and a Treatment Group, and compare 2 groups of which one is AFFECTED and another one NOT AFFECTED.

「Explanatory Variable」 & 「Response Variable」

Theresponse variable is the focus of a question in a study or experiment. An explanatory variable is one that explains changes in that variable. It can be anything that might affect the response variable.

「Samples」 or 「Surveys」

Problem 1: Qualitative and Quantitative data
Problem 2: Representative samples
Problem 3: Biased wording in survey questions
Problem 4: Sampling methods
- Systematic sampling: 100% members of ALL groups chosen.
- Stratified sampling: Some members from ALL groups chosen.
- Random sampling: An adequate number of members chosen, each an equal chance of being in the sample.
- Cluster sampling: 100% members from SOME groups are chosen.

「Observational Studies」 or 「Experiments」

Observational study: Measure or survey members of a sample WITHOUT trying to affect them.
Controlled experiment: Apply some treatment to one of the groups, while the other group does not receive the treatment.

Notes

Randomized experiments are designed to suggest causation
Correlation is WEAKER than causation
To answer a question about a causal relationship, we need to perform an experiment with a treatment group and a control group.
While sample study need a part of relative members, Observational study need ALL members.

solomonxie commented 6 years ago

❖ Random Sampling

"Humans are famously bad at truly random." - Sal Khan

Refer to Khan academy: Techniques for generating a simple random sample Refer to Khan academy: Techniques for random sampling and avoiding bias

Methods of Random sampling:

Simple Sampling
Stratified Sampling
Clustered Sampling

「Simple Sampling」

Refer to Khan academy: Techniques for generating a simple random sample Refer to Wiki: Simple random sample

Set every one a number, and randomly pick numbers out of the bowl, dump the invalid numbers.

Example

Solve:

「Stratified Sampling」

Divide the population to couple of groups, and take samples from EACH group.

Refer to Wiki: Stratified sampling

「Clustered Sampling」

Divide the population to couple of groups, and randomly take a few GROUPS from them as samples.

Refer to Wiki: Cluster sampling

「Random Sampling」 vs. 「Random Assignment」

Refer to Khan academy: Random sampling vs. random assignment (scope of inference)

「Simple Random Sample」SRS

It means that the sample was selected in such a way where each member and set of members has an equal chance of being in the sample.

solomonxie commented 6 years ago

Quick note on: 「Non-random Sampling」 Bias

▶︎ Jump over to Khan academy for practice: Bias in samples and surveys Refer to Khan academy article: Identifying bias in samples and surveys

「Response Bias」

It occurs when people systematically give wrong answers.

「Nonresponse Bias」

It is when people chosen for the sample can't be contacted or refuse to answer.

「Convenience Bias」

Researcher chooses samples that are easiest to reach.

「Undercoverage」

It occurs when some members in the population are left out of the sampling frame.

「Voluntary Response Bias」

Researcher gives an open invitation and people decide to be in the sample or not.

「Wording Bias」

Misleading people by bias words or phrases.

solomonxie commented 6 years ago

Observational Study

WITHOUT affecting them, deeply observe whole (small) population. The key is to observe.

Refer to Khan academy: Worked example identifying observational study

Observational study DOES NOT tell the CAUSAL RELATIONSHIP, but only to tell you if one parameter has positive correlation with another parameter or not.

solomonxie commented 6 years ago

❖ Experiment Study

RANDOMLY divide samples to a Control Group and a Treatment Group, and compare 2 groups of which one is AFFECTED and another one NOT AFFECTED.

Refer to Khan academy: Introduction to experiment design Refer to EUPATI: Clinical trial designs

The purpose is to build a CAUSAL RELATIONSHIP, which might tell you one even can cause another event, which observational study can't tell.

The key is to divide two groups randomly, so that you will know how the affection really makes impact.

Two groups:

Control group is the group without taking affects.
Treatment group is the one will be having affects on.

How to conduct a good Experiement

There're a few keys to conduct a good experiment:

Randomly divide samples to two groups, to eliminate biases.
It should be a BLIND EXPERIMENT, which all people don't know which group they're in.

「Placebo Effect」

Placebo means "fake medicine", which made by sugar. In drug testing and medical research, it's a very common way to test how mentality will affect the patient.

For conducting a medicine experiment, we randomly separate people to two groups:

Control group: people will receive placebo.
Treatment group: people will receive real medicine.

「Blind Experiment」 & 「Double Blind Experiment」

Blind experiment: All the observed people don't know which group they're in.
Double Blind Experiment: Not only the observed people, but even the conductors/administers don't know which is which.
Triple Blind Experiment: Even the people who analyze the data don't know which group they're analyzing.

It's a great way to avoid BIAS.

Improving 「Randomly Grouping」

Some times complete randomness will make things uneven, which raise the bias in experiment. etc., there're more women in one group and less in another, that affects much in the result; there're more young people in one group, that affects much as well.

So for helping to adjust this situation well, we want to introduce some improvement design for group strategy:

Block Design
Cross Over Design
Matched Pairs Design: It is a special case of a randomized block design.

Randomized 「Block Design」

With a randomized block design, the experimenter divides subjects into subgroups called blocks, such that the variability within blocks is less than the variability between blocks. Then, subjects within each block are randomly assigned to treatment conditions. Compared to a completely randomized design, this design reduces variability within treatment conditions and potential confounding, producing a better estimate of treatment effects.

「Cross Over」 Design

It's simply just to "switch group", which after a period of time after the experiment to do the second experiment, that let the same people in Control Group switch to Experiment Group, and the other people switch as well.

Khan academy made the wrong video named "matched pairs design" which is actually "Crossover Design". Refer to Khan academy: Crossover Design ~(Matched pairs experiment design)~

「Matched Pairs」 Design

In the matched-pair design, participants are first matched in pairs according to certain characteristics. Then, each member of a pair is randomly assigned to one of the two different study subgroups. This allows comparison between similar study participants who undergo different study procedures.

「Replication」

"A very important idea, in science in general... Other people should be able to replicate and reinforce this experiment and hopefully get the consistent result" - Sal Khan

solomonxie commented 6 years ago

❖ Probability [DRAFT]

「Theoretical Probability」 vs. 「Experimental Probability」

The experimental probability should get closer and closer to the theoretical probability after trying more and more times.

Theoretical Probability: It's what's expected to happen based on the possible outcomes, assuming equally likely events.
Experimental Probability: It's the result of an experiment or simulation after a large number of times.

solomonxie commented 6 years ago

Experimental Probability [DRAFT]

Random numbers for experimental probability

Statistical significance of experiment

The threshold: if the probability of an event is less than 5%, then it'll be called significant.

solomonxie commented 6 years ago

❖ Probability Rules

「Multiplication Rule」 A and B

The probability of multiple events occur at the same time is the multiplication of their probabilities.

「Addition Rule」A or B

The A or B probability is both of their favourable outcomes minus the OVERLAPS (common outcomes), which is (A + B - C). The formula is: P(A or B) = P(A) + P(B) - P(A and B).

Refer to Khan academy: Addition rule for probability

Example

Solve:

「Mutually Exclusive events」

Mutually exclusive events cannot happen at the same time.

Example

Solve:

Females=22, Chocolate=33, Overlaps=15
The probability is: (22+33-15)/50

「Compound probability」A then B

The problem "What's the probability of flipping coin to get 3 head in a row?" is a typical Compound probability of independent events problem.

The formula of Compound probability of independent events is the same with multiplication rule.

Example: Flipping a coin three times, what's the probability of getting a tail, head and tail ?

「At least one」 probability

solomonxie commented 6 years ago

Dependent Probability

Dependent probability means the result of second event will change because of what happened first.

Refer to Khan academy: Dependent probability introduction

Two events are INDEPENDENT to each other when:

Furthermore, with concept of Conditional Probability, two events are INDEPENDENT when:

`Dependent Probability` & `Independent Probability`

▶ Practice on Khan academy: Dependent and independent events

Example

Solve:

P(A) = 4/24
P(B) = 4/24
P(A & B) = 1/24
They're not independent because:

Example

Solve:

solomonxie commented 6 years ago

❖ Conditional Probability

It's NOT just both events happened, it's asking the probability of one event AFTER another event happened. It's based on a happened event, that's why you're to divide the probability of the happened event.

Notation

Probability of B given A (B after A, or B in condition of A), or Probability of A given B:

Formula

Understanding 「Conditional Probability」

Refer to youtube: Probability Part 2: Updating Your Beliefs with Bayes: Crash Course Statistics #14

Based on the concept of Set. It’s lot more intuitive to understand with a vann diagram. P(A | B) = P(A & B) / P(B) Circle of B is there for sure, proportion of A happen must be IN the circle of B, which is P(a & b). Divided by P(B) means, the proportion of A of B, means how much percentage of A space taken on B.

「The Parallel World」

Imagine there are many "parallel worlds", say A-world & B-world which are the worlds A & B occur. Of course they're parallel and happening at the same time, yet there could be chance of intersection, that A occurs in B's world, or B occurs in A's world.

And the chance of one event occurs in "another world" is the Conditional probability.

In the context of the Anime Steins;Gate, the conditional probability is chance of Mayuri being killed in Alpha-worldline.

Steins;Gate Worldline:

「A GIVEN B」

It means "A after B", or "A after B has happened". Instead of happening at the same time P(A and B), the probability won't be the same if one has already happened.

Divide by B's probability

It shows how much the A and B covered the happened event B.

Example

Solve:

Based on the conditional probability formula: P(A|B) = P(A and B) / P(B)
The P(south Asia ∣ high) = P(south Asia and High) / P(high) = 7/188 ÷ 87/188 = 7/87

solomonxie commented 6 years ago

❖ Bayes' Theorem (Basics)

The Bayes' Theorem is a revolution to conditional probability.

It does not intent to do once calculation, but is a progress of improving: each time gain a little bit more confidence.

Refer to youtube: Bayes' Theorem - The Simplest Case Refer to youtube: The Bayesian Trap

The formula of Bayes' Theorem is just a slightly extension to Conditional Probability.

▲ Probability of A given B and B given A has the same numerator, that being said, We can easily compute a conditional probability with its reversed event.

Understanding the formula

How does it make sense?

In real life, sometimes A given B is easy to get, sometimes B given A is easier to get. So whenever we encounter some difficulties of computing A given B, we can always use probability of B given A to compute.

Example: Spam emails

Example: Disjoint Union

Refer to youtube: Bayes' Theorem - Example: A disjoint union

Example: False Positives

Refer to youtube: Bayes' Theorem Example: Surprising False Positives

solomonxie commented 6 years ago

❖ Bayesian Statistics [DRAFT]

Refer to youtube: You know I’m all about that Bayes: Crash Course Statistics #24 Refer to youtube: Bayes in science and everyday life: Crash Course Statistics #25

solomonxie commented 6 years ago

❖ What is 「Random Variable」

Instead of analyzing a measured distribution with explicit data, we're to abstract those analysis methods with uncertain data. It's like abstracting arithmetic to algebra.

Refer to Khan academy: Random variables ▶︎ Jump over to Khan academy for practice: Constructing probability distributions

Random Variables are just like the unknowns in algebra. Except it's slightly different in Statistics.

Remember that: Studying Random Variables is just like studying Algebra over Arithmetic.

More precisely, Random variables are neither random nor variables. (Try to google that)

Random Variables are denoted by capital letters: X, Y, Z

Example

Solve:

Example

Solve:

Types of 「Random Variable」

Refer to Khan academy: Discrete and continuous random variables

Discrete Random Variable
- Binomial Random Variable
- Geometric Random Variable
Continuous Random Variable

"Discrete" literally means "Distinct" or "Separate" values.

The most useful one for real life is the Discrete Distribution, and we're gonna talk about it mostly.

solomonxie commented 6 years ago

Probability Distribution [DRAFT]

Refer to Wiki: Probability Distribution

It takes each Random Variable's value as an input, to form a distribution. etc., values of a Discrete Random Variable can form a Discrete Distribution.

Types of Probability Distribution

Discrete distribution
Uniform distribution
Bernoulli distribution
Normal distribution
Poisson distribution

A valid Probability Distribution

All probabilities add up to 100%.
All probabilities must be non-negative.

solomonxie commented 6 years ago

Discrete Random Variable [DRAFT]

Mean (Expected Value)

In the case of a discrete random variable, expected value or mean — denoted as E(X) or μx is the long-run average outcome. To find expected value, take each value, multiply it by its respective probability, and add up all the products.

(Where the sum of all possible value of x)

Example

Solve:

Note that it's asking for mean of M, not for X.
Apply the formula of Expected Value of Discrete Random Variable
μx = (-10*0.81) + (40*0.18( + (90*0.01) = 0

Variance (Deviation)

Variance (σ²):

Standard Deviation (σ):

Example

Solve:

Apply the formula for variance.
σ² = (1-1.75)^2*0.5 + (2-1.75)^2*0.25 + (3-1.75)^2*0.25 = 0.6889
σ = √0.689 = 0.83

solomonxie commented 6 years ago

❖ Probabilities from 「Density Curves」

Discrete distribution: it can only take finite because numbers.
Continuous distribution: it can take an infinite number.

That’s why Discrete Distribution use histogram, and Continuous Distribution use density curve.

Probabilities over an 「Interval」

Example

Solve:

Notice that the probability of x>3 is the area of the triangle between 3 and 5.
So P(x>3) = Area(3 to 5) = 1/2 * (5-3) * 0.6 = 0.6

Probability in 「Normal Distribution」

How to use Calculator for Probability in normal distribution

We could use any Graphic Calculator or online calculator, and input the Mean, Standard Deviation, Lower bound, and Upper bound.

▶︎ Online Normal Distribution Calculator.

etc., we know the mean = 70, SD=6, and asked to calculate the probability of value greater than 61. By input these values we'll get the anser:

Example

Solve:

We know the Probability Distribution is a Normal Distribution.
For the probability of X<1, we're to get the area of X<1.
So the the Z-score at X=1 is necessary for calculation.
Z-score = (X-μ)/σ = (1 - 1)/0.05 = 0
Take a look at Z-score table, we know that the probability of 0 is 0.5.

solomonxie commented 6 years ago

❖ Operations of 「Random Variables」

Some basic "algebraic" operations, like adding/multiplying a number, or combining different R.V.s

「Shift」

The addition or subtraction of Random Variable X will have these effects:

Mean: Shift by the same value with X.
Variance: Maintain the same.

「Scale」

The scale of Random Variable X will have these effects:

Mean: Scale by the same value with X.
Variance: Scale by the same value with X.

Example

Solve:

Effect on mean(μ): μY = 10(μX) + 5 = 24.5, because mean will be effected by both shift & scale.
Effect on Standard deviation(σ): σY = 10μX = 8, because σ will only be effected by scale.

「Combine」 Random Variables

Refer to wiki: Algebra of random variables Refer to article on Khan academy: Combining random variables

Important facts about combining variances:

The variables must be independent to each other.
We can find the standard deviation by taking square root √ of the combined variances.
The variance increases even when we subtract random variables.
If both Random Variables are normally distributed, then the Difference of them will also be normally distributed.

Example

Solve:

Probability of 「Combined Normal Random Variables」

Remember: If both Random Variables are normally distributed, then the Difference of them will also be normally distributed.

Example

Solve:

Let D be the new Random Variable which D = X - Y
For calculating the probability of a normal distributed random variable, we need to know the mean, standard deviation, and boundaries.
Get the basic stats of D:
According to the condition, the boundary is -10 < D < 10
Input the required information to a calculator:
The answer is 0.57.

solomonxie commented 6 years ago

❖ 「Expected Value」 of a Random Variable

It's also known as the Expectation, Mathematical Expectation, EV, Average, Mean Value, Mean, or First Moment.

Refer to wiki: Expected value

"In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents."

Calculate 「Expected Value」

Example

Solve:

-1.4 -0.3 +1.2 +2.1 = 1.6

Example

Solve:

Example

Solve:

solomonxie commented 6 years ago

❖ 「Relative Frequency」 for Expected Value

▶︎ Practice at Khan academy: Making decisions with expected values

⦿ Mindset: It's better to count 1 by 1 rather than trying to apply formulas.

There're 2 ways to calculate the Expected Value:

E(X) = ∑ Probability · value
E(X) = ∑ Relative Frequency · value

Relative Frequency means How often something happens divided by all outcomes.

Depends on the case we're analyzing, we can choose either way to calculate expected value.

All the Relative Frequencies add up to 1.

「Absolute Frequency」

▶︎ Jump back to previous note on: Permutation & Combination

It also means the Relevant Outcomes, which is calculation of "n choose k" combinations.

「Sample Size」

▶︎ Jump back to previous note on: Intro to Probability

It literally means the Total Outcomes.

For a Flip Coin problem (Yes-No problem), the total outcomes is 2^trails, which means 2 * 2 * 2 ..... etc., the total outcomes of "flipping a coin 5 times" is 2⁵ = 32.

Calculate 「Probabilities for Expected Value」

▶︎ Practice at Khan academy: Expected value with calculated probabilities

Example

Solve:

This is a typical Flipping Coin problem.

Example

Solve:

The most tricky part is how to calculate the probability of his each position.

Example

Solve:

Example

Solve:

Getting data from 「Expected Value」

Refer to Khan academy: Getting data from expected value ▶︎ Practice at Khan academy: Expected value with empirical probabilities

Example

Solve:

Example

Solve:

solomonxie commented 6 years ago

Bernoulli Distribution

It is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q=1-p, that is, the probability distribution of any single experiment that asks a yes–no question; the question results in a boolean-valued outcome, a single bit of information whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q.

Refer to wiki: Bernoulli Distribution Refer to Khan academy: Bernoulli distribution mean and variance formulas

「Mean & Variance」of Bernoulli Distribution

「Bernoulli Distribution」 vs. 「Binomial Distribution」

Refer to stackexchange: What is the difference and relationship between the binomial and Bernoulli distributions?

All Bernoulli distributions are binomial distributions, but most binomial distributions are not Bernoulli distributions.

solomonxie commented 6 years ago

❖ Binomial Random Variables

Binomial Distribution is one of the Discrete Distributions.

Binomial means "Two terms", that being said the Binomial Random Variable is a Random Variable contains TWO Parameters:

n: A certain number of trails, which is a certain number.
p: A certain & constant probability of each trail being success.

Refer to wiki: Binomial distribution Refer to article: Binomial Random Variables Refer to Khan academy: Binomial variables

Requirements

The requirements for a random experiment to be a binomial experiment are:

n: There is a certain total number of trails.
p: A certain & constant probability for each trail.
Yes-no question: Each trail's outcome is either success or failure.
Independent: Each trail is independent to each other.

It can be simplified as: N, P, YES-NO, INDEPENDENT

Identifying 「Binomial Variables」

Examples of binomials

A fair coin is flipped 20 times; X represents the number of heads. X is binomial with n = 20 and p = 0.5.
You roll a fair die 50 times; X is the number of times you get a six. X is binomial with n = 50 and p = 1/6.
The probability of having blood type B is 0.1. Choose 4 people at random; X is the number with blood type B. X is binomial with n = 4 and p = 0.1.
Draw 3 cards at random, one after the other, with replacement, from a set of 4 cards consisting of one club, one diamond, one heart, and one spade; X is the number of diamonds selected. Sampling with replacement ensures independence. X is binomial with n = 3 and p = 1/4
Approximately 1 in every 20 children has a certain disease. Let X be the number of children with the disease out of a random sample of 100 children. Although the children are sampled without replacement, it is assumed that we are sampling from such a vast population that the selections are virtually independent. X is binomial with n = 100 and p = 1/20 = 0.05.

Examples of non-binomials

Roll a fair die repeatedly; X is the number of rolls it takes to get a six. X is not binomial, because the number of trials is not fixed.
A student answers 10 quiz questions completely at random; the first five are true/false, the second five are multiple choice, with four options each. X represents the number of correct answers. X is not binomial, because p changes from 1/2 to 1/4.
Draw 3 cards at random, one after the other, without replacement, from a set of 4 cards consisting of one club, one diamond, one heart, and one spade; X is the number of diamonds selected. X is not binomial, because the selections are not independent. (The probability (p) of success is not constant, because it is affected by previous selections.)

「Independence Assumption」 10% Rule

To identify a binomial random variable, we also need to prove its independence. In the case of large number of trails, we can't examine each trail but only to sample out a smaller number of trails.

With Replacement Sampling, since it's NOT really independent because when you take out a sample it will affect the rest samples. But the good thing is if the base number is large enough, then your replacement won't be a big deal to affect the result.

So that's the reason we introduced the 10% Rule, which means if the number of your samples are less than 10% of total, then we can assume each trail is independent. Because the portion is too small to affect all.

「Simple Random Sample」SRS

It means that the sample was selected in such a way where each member and set of members has an equal chance of being in the sample.

Replacement & Non-replacement

Sampling with replacement, means that every time you take out the sample, the total number will decrease, which affects the probability of rest samples. etc., there're 10 balls with different color, if you take out a red ball, then the probability of getting another red ball in the rest 9 balls will decrease.

Sampling with Non-replacement, means that each time you take out the sample, you put it back.

「10% Rule」

10% Rule is a rule to assume independence between trails.

If the number of your samples are less than 10% of total, then we can assume each trail is independent. Because the portion is too small to affect all.

solomonxie commented 6 years ago

Binomial Probability

Refer to article on Khan academy: Binomial probability (basic) Refer to Khan academy: Generalizing k scores in n attempts

▶︎ Online Binomial Probability Calculator

Formula of 「Binomial Probability」

We could simplify (verbal) it as:

P(X=r) = Combinations × P(yes) × P(no)

For the combinations, here's the formula:

Or use the ▶︎ Online Combination Calculator.

Example:

Example

Solve:

Apply the Binomial Probability Formula, the answer is:

「Mean & Variance」 of Binomial R.V.

Formula

Expected Value = Mean = μx
Variance = Standard Deviation = σx

Example

Solve:

Example

Solve:

Mean = μx = np = 100 * 0.25 = 25
SD = σx = √(np(1-p)) = √(25*0.75) = 4.33

「Cumulative Binomial Probability」

Example

Solve:

One way:
Another way:

Example

Solve:

We can see it as P(X > 3) = P(4) + P(5), or P(X>3) = 1 - (P(1) + P(2) + P(3)), we're gonna use first one in this case.

solomonxie / blog-in-the-issues

Statistical Guessing 统计式瞎猜 #50

Study Resources

Tools

Khan academy AP Statistics

Machine Learning related topics

Benford's Law

Two-way Tables (Joint Distributions)

Definitions

Two-way Table

Marginal Distribution

Conditional Distribution

Trends in categorical data

Example

Example

Example

Frequency Table & Dot plot

❖ Central Tendencies: Mean, Median, Mode

Impact on median & mean

Average

❖ Quartiles and Box plots (Distribution graph)

「Interquartile range」IQR (Box plot)

Example

「Five-number Summary」

Example

「Box and Whisker Plot」

Find out the 「Mean」 in Box plot

Example

Example

Practice

Practice

「Variance」 Deviation

「Standard Deviation」

Example

Sample Variance

Why do we need to divide by n-1?

Easy way to calculate Sample Variance

Sample Standard Deviation

Example

「Mean Absolute Deviation」 (MAD)

❖ Intro to Probability [DRAFT]

「Single Event」

「Theoretical Probabilities」 & 「Experimental Probabilities」

「Single Event Repeats」

「Multiple Events」

「Independent events」 in sequence

「Independency」

「Gambler's Fallacy」

「Sample Space」

「Sample Size」

Example:

Example:

Example:

❖ 「Permutations」 & 「Combinations」

「Permutations」

Notation

Understanding 「Permutations」

Formula

「Combinations」

Notation

Formula

❖ 「Set」 Basics

「Membership」

「Subsets」

「Empty Set」 ∅

「Universal Set」 U

Basic Set Operations

「Intersection」 ⋂, &, and

「Union」 ⋃, |, or

「Complements」 \, -, subtract

Example

Histogram

「Relative Frequency Histogram」

Stem & Leaf Plot

❖ Describing Distributions

「Shapes」: Normal, Left Skewed, Right Skewed

Example

「Spread」: Range, IQR, Standard Deviation, MAD

「Centres」: Mean, Median, Mode

「Outliers」

`Two-way Table`

`Marginal Distribution`

`Conditional Distribution`

`Trends in categorical data`

`Average`

Why do we need to divide by `n-1`?

`Sample Standard Deviation`

`Basic Set Operations`