TO-DO

Unit 1: Introduction

[x] Lecture 1: Algorithmic Thinking, Peak Finding
[x] Lecture 2: Models of Computation, Document Distance
Exercise 1:
- [x] quiz from khan academy.
- [ ] quiz and solutions
- [ ] [programming problem]() and [solution]()
Study notes
- [ ] Asymptotic Analysis
- [x] Algorithmic Thinking and Computational Model

Unit 2: Sorting and Trees

; Event Simulation

[x] Lecture 3: Insertion Sort, Merge Sort
[x] Lecture 4: Heaps and Heap Sort
[ ] Lecture 5: Binary Search Trees, BST Sort
[ ] Lecture 6: AVL Trees, AVL Sort
[ ] Lecture 7: Counting Sort, Radix Sort, Lower Bounds for Sorting
Exercise:
- [ ] quiz and solutions
- [ ] another quiz and solutions
- [ ] [programming problem]() and [solution]()
Study notes
- [ ] Sorting
- [x] Heap
  Unit 3: Hashing
  
  ; Gene Comparision
[x] Lecture 8: Hashing with Chaining
[ ] Lecture 9: Table Doubling, Karp-Rabin
[ ] Lecture 10: Open Addressing, Cryptographic Hashing
[ ] Exercise 4: ( [problems]() and [solutions]() )
[ ] [Study note]()

Unit 4: Numerics

; RSA encrypton

[ ] Lecture 11: Integer Arithmetic, Karatsuba Multiplication
[ ] Lecture 12: Square Roots, Newton's Method
[ ] Exercise 5: (problem and solution)
[ ] [Study note]()

Unit 5: Graphs

; Rubik's cube

[x] Lecture 13: Breadth-First Search (BFS)
[x] Lecture 14: Depth-First Search (DFS), Topological Sort
Study notes
- [ ] Graph, Graph Representation, BFS and DFS

Unit 6: Shortest Paths

; Caltech to MIT

[ ] Lecture 15: Single-Source Shortest Paths Problem
[ ] Lecture 16: Dijkstra
[ ] Lecture 17: Bellman-Ford
[ ] Lecture 18: Speeding up Dijkstra
[ ] Exercise 6: (problem and solution)
[ ] [Study note]()

Unit 7: Dynamic Programming

; Image comperision

[ ] Lecture 19: Dynamic Programming I: Fibonacci, Shortest Paths
[ ] Lecture 20: Dynamic Programming II: Text Justification, Blackjack
[ ] Lecture 21: Dynamic Programming III: Parenthesization, Edit Distance, Knapsack
[ ] Lecture 22: Dynamic Programming IV: Guitar Fingering, Tetris, Super Mario Bros.
[ ] Exercise 7: (problem and solution)
[ ] [Study note]()

Unit 8: Advanced Topics

[ ] Lecture 23: Computational Complexity
[ ] Lecture 24: Topics in Algorithms Research
[ ] [Study note]()

Read Textbook

INDEX

1. **Foundation** - [x] The Role of Algorithms in Computing - [ ] Getting Started - [x] Growth of Functions - [ ] Divide and Conquer - [ ] Probabilistic Analysis and Randomized Algorithms 2. **Sorting and Order Statistics** - [ ] HeapSort - [ ] QuickSort - [ ] Sorting in Linear Time - [ ] Medians and Order Statistics 3. **Data Structures** - [ ] Elementary Data Structures - [ ] Hash Tables - [ ] Binary Search Trees - [ ] Red-Black Trees - [ ] Augmenting Data Structures 4. **Advanced Design and Analysis Techniques** - [ ] Dynamic Programming - [ ] Greedy Algorithms - [ ] Amortized Analysis 5. **Advanced Data Structure** - [ ] B-Trees - [ ] Fibonacci Heaps - [ ] van Emde Boas Trees - [ ] Data Structures for Disjoint Sets 6. **Graph Algorithms** - [ ] Elementary Graph Algorithms - [ ] Minimum Spanning Trees - [ ] Single-Source Shortest Paths - [ ] All-Pairs Shortest Paths - [ ] Maximum Flow 7. **Selected Topics** - [ ] Multithreaded Algorithms - [ ] Matrix Operations - [ ] Linear Programming - [ ] Polynomials and the FFT - [ ] Number-Theoretic Algorithms - [ ] String Mathing - [ ] Computational Geometry - [ ] NP-Completeness - [ ] Approximation Algorithms 8. **Mathmatical Background** - [ ] Summations - [ ] Sets, Etc. - [ ] Counting and Probability - [ ] Matrices

Resource

Official Course web page go
edwith [MIT]파이썬을 이용한 알고리즘의 이해 go
textbook <Introduction to Algorithms, CLRS> go
MIT Mathematics for Computer Science 6.042J go
Online LaTeX Equation Editor go

Course Overview

- Efficient procedure for solving large scale problems.
- Scalability
- Classic data structure & Classical algorithms
- Real implementations in Python

Algorithmic Thinking

Peak Finding Problem

Problem: Find a peak if it exists

one dimensional version;

Peak = b is a peak if and only if b>=a and b>=c

1	2			n/2			n-1	n
a	b	c	d	e	f	g	h	i

a straight forward algorithm is

If the element is on the middel, it should look at n/2 elements. anyway, the worst-case complexity is Θ(n). asymptotic analysis: Θ(n).

if you use binary search

asymptotic analysis:

two dimensional version;

a is a 2D-peak if a>=b and a>=d and a>=c and a>=e

	1	2	...
1		c
2	b	a	d
...		e
n-1
n

..	..	..	..
14	13	12	..
15	9	11	17
16	17	19	20

Gradient Desent Algorithm

- you should determind where to start.
- move to higer value nearby
- such as `12 -> 13 -> 14 -> 15 -> 16 -> 17 -> 19 -> 20 (the peak! end)`
- It takes `Θ(nm)` complexity, it eqeul to `Θ(n^2)` if m = n.

Bineary search version


- Pick middle column j=m/2.
- Find a 1D-peak at (i, j)
- Use (i, j) as a start to find a 1D-peak on row i

It looks like an efficient algorithm but doesn't work. a less efficient and correct algorithm is better than the incorrect one

Pick middle column j = m/2
Find global max on column j at ( i, j )
If ( i, j )>=( i, j-1 ) and ( i, j )>=( i-1, j ). Then ( i, j ) is a peak. Done
else if `( i, j-1 )>( i, j ) pick left cols, Similarly for right.
slove the row problem with half the number of cols.
when you have a single col, find the global max, that is the peak. Done

Models of Computation

What's on algothm?
- computatinal procedure for sloving problem.
- input -> algorithm -> output
- but it is not program, its mathmatical analog of programming language.

Program and Algorithm

program	algorithm
programming language	pseudocode / structured english
computer	model of computation

Model of computation specifes
- what operations an algorithm is allowed
- cost(time, .. ) of each operation

Random Access Machine (RAM)

The RAM model contains instructions commonly found in real computers:

arithmetic (such as add, subtract, multiply, divide, remainder, floor, ceiling),
data movement (load, store, copy),
control (conditional and unconditional branch, subroutine call and return).
Each such instruction takes a constant amount of time

word: w bits

Pointer Machine

dynamically allocated objects
object has O(1) fields
field = words (e,g. int) or pointer to object or null (None)

Python model

list() = array:
- L[i] = L[j] + 5 -> O(1) time
- L.append(x) -> table doubling (Lq) O(1) time
- L = L1 + L2 -> O(L1+L2) time
- L.sort() -> O(L log(L)) time
Object with O(1) attributes
- x = x.next O(1) time
dict():
- D[key] = val -> O(1) time with high probablity
- key in D -> O(1) time
long
- x+y -> O(|x|+|y|) time
- x*y -> O(|x|+|y|)^log_3) time
heapq

more about go

Document Distance Problem

Probelm: smilarity between D1 and D2.
d(D1, D2)
D = document = sequence of words
w = word = string of alphanumeric chars
idea: smilarity = shared words
think of document as a vector.
D[w] = number of ocurrences of w in D.

algorithm:

1. split doc into words
2. compute word frequencies
3. dot product

in python:

for word in doc: # O(|doc|)
  count[word] += 1

re is usally takes exponential time, so don't use it!

Sorting

If you want to find an item from the unsorted array, it would take O(n). If you want to find an item from the sorted array, it would take O(log n).

why do we need sorting?
- phone book / mp3 playlist
- finding median value
- bineary search
- finding statistic outlier
- data abstraction; find some duplicate value by sorting
- computer grapic; back to front screen rendering

Insertion Sort

Concept: insert the value at the right position.

5	2	4	6	1	3
`2`	`5`	4	6	1	3
2	`4`	`5`	6	1	3
2	4	5	6	1	3
`1`	`2`	`4`	`5`	`6`	3
1	2	`3`	`4`	`5`	`6`

Space Complexity: inplace Time Complexity: O(n^2)

Θ(n) steps (key positions).
Each step is Θ(n) swaps & comparison.

if comparison >> swaps
- Do a binary search on A[0:key-1], where already sorted, in Θ(log n) time.
- In total Θ(n log n) compares.
- but we need to do this? (no)
- to do an insertion in the middle of the array structure, we need to shift things left to right. in the worst case, you should shift all element to right. It will take Θ(n).
- so there are still Θ(n^2) swaps.

Implementation;

for j = 2 to A.length
  key = A[j] 
  # insert A[j] into sorted sequence A[1:j-1]/
  i = j - 1
  while i > 0 and A[i] > key
    A[i+1] = A[i]
    i += -1
  A[i+1] = key

Merge Sort

Concept:

Time Complexity: O(n*log(n))

Implementation;

with recurrence.

Heap Sort

Concept: Convert A[1:n] into a max-heap

Time Complexity: O(n*log(n))

Implementation;

heap implement

build_max_heap(A)

for i=n/2 down to 1
do `max_heapify(A, i)`

Graph

Graph G = (V,E)

V = list of vertices.
E = list of edges.
- Undirected graph; edge is unordered pairs e = {v, w}
- Direected Graph; edge is ordered pairs e = (v, w)

Applications:

web crawling
social networking
network broadcast
garbage collection
model checking
check mathematical conj
solving puzzles games

example;

Pocket Cube 2x2x2 as configuration graph
- vertex for each possible state of cube. vertices = 8! * 3^8 = 264,539,520
- edge for each possible move

diameter:

There are more info about diameter at
- Computing Distances and Diameter

Graph Representation

Adjacency list

This is one of several commonly used representations of graphs for use in computer programs.
Adjacency list is a collection of unordered lists.
Each list describes the set of neighbors of a vertex in the graph.

implementation;

array Adj of |V| linked list
for each vertex u in V, Adj[u] stores us neighbors.
```
Adj[b] = {a, c}
Adj[a] = {c}
Adj[c] = {b}
```
Space Complexity: Θ(V+E)

in more object-oriented fashion (you cound't do this with multiple graph)

V.neigbors = Adj[v]

implicit representation:
- Adj(u) is a function
- v.neighbors() is a method

Graph Search

; explore a graph.

example;
- Find direction from node s to node t.
- Visit all of the nodes and edges from node s.

Breadth-First Search

Implementation:

BFS(V, Adj, s):
    level = { s: 0 }
    parent = { s: None }
    i = 1
    frontier = [s]
    while frontier:
        next = []
        for u in fronteir:
            for v in Adj[u]:
                if v not in level:
                    level[v] = i
                    parent[v] = u
                    next.append(v)
            frontier = next
            i += 1

Analysis: O(E)

Shortest Path

Depth-First Search

Implementation: It could be simply implemented with recursion.

parent = {s: None}
DFS_Visit(V, Adj, s):
    for v in Adj[s]:
        if v not in parent:
            parent[v]=s
            DFS-Visit(V, Adj, v)

we dont really start. so try all the possible vertex.

parent = {}
for s in V:
    if s not in parent:
        parent[s]= None
        DFS_Visit(V, Adj, s)

Analysis: Θ(V+E) (linear time)

- visit each vertex once
- DFS_Visit(.., .., v) called once per vertex v
- pay |Adj[v]| => O(sum of all |Adj[v]|) == O(E)

Edge classification

tree edge: (parent painter) visit new vertex via edge.
forward edges: node -> descendant in tree.
backward edges: node -> ancestor in tree.
cross edges: between two non-ancestor-related subtrees.

Cycle Detection

Topological Sort

Single-Source Shortest Paths Problem

Dictionaries & Python

dictionary: Abstract Data Type (ADT), maintain set of item, each with a key.
- insert(item) : add item to set
- delete(item) : remove item from set
- search(key) : item with key if it exists
- O(log n) via AVL tree. -> our goal is O(1)
Python dict
- D[key]: search
- D[key] = val: insert
- del D[key]: delete
- item = (key, value)

Motivation

document distance problem
database
compilers & interpreters: variables
network router: IP address to wire
substring search: grep, Google
string commonalities: DNA
file/dir synchronized: rsync
cryptography: file transfer & identification

How do we solve the dictionary problem?

simple approach: Direct-access table

store items in an array indexed by a key. (random access)
problem
1. keys may not an integer.
2. gigantic memory hag

solution: PREHASH keys to integers

in theory: possible because keys are finite (set of keys is countable)
in Python: hash(object) where an object is a number, string, tuple, etc.
in theory: x = y <=> hash(x) = hash(y)
Python applies some heuristics for peacticality
object's key should not change while in table. (else can not find it anymore)
no mutable object like lists.

solution: HASH

reduce universe u of all keys (say, integers) down to reasonable size m for table.
idea: m ≈ n = # keys stored in dictionary
hash function: h: U →{0,1,...,m−1}
problem: two keys k_i, k_j ∈ Kcollide if h(k_i) =h(k_j)

How do we deal with collisions?

solution: CHAINING

today's solution.

Simple Uniform Hashing

An assumption (cheating): Each key is equally likely to be hashed to any slot of table, independent of where other keys are hashed.

"Good" Hash functions

Heap

Priority Queue (ADT)

Implements a set S of elements, each of elements associated with a key.
method:
- Insert(S, x): insert element x in to set S
- max(S): return element of S with the largest key
- extract_max(S): return element of S with the largest key and remove it from S
- Increase_key(S, x, k): increase the value of x's key to new value k
How you maintain the invariant of this the data structure?
Priority Queue by Heap
An array visualized on a nearly complete binary tree

16	14	10	8	7	9	3	2	4	1

index
- root of tree: first element i=1
- parent: i = child_idx/2
- left child: i = 2*parent_idx
- right child: i = 2parent_idx +1

Heap Operations

invariance is property of a max heap
build_max_heap: produce a max heap from an
max_heapify(A, i): correct a single violation of the heap property in a subtree's root
- Assume that the trees rooted at left(i) and right(i) are max heaps.
Time Complexity of max_heapify: O(log n)
- visualization of heap is binary tree
- depth of binary tree is log(N)
- key assumption of max_heapify is that there are single violation and trees rooted at left(i) and right(i) are max heaps
- without this assumption time complexity gonna be O(N)
  Heap Sort
  
  Concept: Convert A[1:n] into a max-heap

build_max_heap(A)

for i=n/2 down to 1
do `max_heapify(A, i)`

Implementation;

build_max_heap from unordered array
Find max element A[i]
Swap elements A[n] with A[i], now max element is at the end of array.
Discard node n from heap. decrementing heap size.
Now root may not maxheap property, but children are max heaps. so, we can correct it with one max_heapify (go back to number 2)

Time Complexity: O(n*log(n))

Observe
- max_heapify taken O(1) for nodes that are one level above leaves
- and in general O(l) time for nodes that are l length above the leaves
- n/4 nodes with level 1, n/8 with level 2, ... , 1 node at log n level

Binary Search Trees

Fundamental divide & conquer paradigm.

hon9g / algorithms

[MIT6.006] Introduction to Algorithms #4

TO-DO

Unit 1: Introduction

Unit 2: Sorting and Trees

Unit 3: Hashing

Unit 4: Numerics

Unit 5: Graphs

Unit 6: Shortest Paths

Unit 7: Dynamic Programming

Unit 8: Advanced Topics

Read Textbook

Resource

Course Overview

Algorithmic Thinking

Peak Finding Problem

one dimensional version;

two dimensional version;

Models of Computation

Program and Algorithm

Random Access Machine (RAM)

Pointer Machine

Python model

Document Distance Problem

Sorting

Insertion Sort

Merge Sort

Heap Sort

Graph

Graph Representation

Adjacency list

Graph Search

Breadth-First Search

Shortest Path

Depth-First Search

Edge classification

Cycle Detection

Topological Sort

Single-Source Shortest Paths Problem

Dictionaries & Python

Motivation

How do we solve the dictionary problem?

simple approach: Direct-access table

solution: PREHASH keys to integers

solution: HASH

How do we deal with collisions?

solution: CHAINING

Simple Uniform Hashing

"Good" Hash functions

Heap

Priority Queue (ADT)

Priority Queue by Heap

Heap Operations

Heap Sort

Binary Search Trees