solomonxie / blog-in-the-issues

A personalised tech-blog, notebook, diary, presentation and introduction.
https://solomonxie.github.io
67 stars 12 forks source link

Algorithms and Data Structure 算法与数据结构 #46

Open solomonxie opened 6 years ago

solomonxie commented 6 years ago

Covers the topics of Algorithms & Data structures.

Data Structures

Linear List Data Structure:

Tree Data Structure:

Graph Data Structure:

Algorithms

Sorting:

References: 数据结构 一文全覆盖 | InterviewMap 算法 一文全覆盖 | InterviewMap

solomonxie commented 5 years ago
solomonxie commented 5 years ago
solomonxie commented 5 years ago
solomonxie commented 5 years ago

Characteristics of Data Structures

image

solomonxie commented 5 years ago

算法效率的度量方法

时间复杂度

时间是指运行时间

语句总的执行次数:T(n) T(n)是根据规模n给出语句的总执行次数。 一般随着n增大,T(n)增长最慢的算法为最优算法。

因为T(n) = O(f(n)),所以简单用O(n)代表。 而O(n)就是算法时间复杂度函数,简称BigO函数。

推导BigO:

例如: 如果算法只需要执行3步,那么f(n) = 3,而时间复杂度是O(1),而不是O(3),为什么呢? 因为f(n)的执行次数,实际上不会随着n而变化,而始终是执行1次。所以是O(1)

要分析算法的复杂度,主要就是分析循环的运行情况。 在一个for loop中,如果运行的语句是简单的一个O(1)语句,那么因为要循环运行n次,算法复杂度是O(n)。 而如果在for loop中,运行的语句是c = c*2,那么n=2ˣ,换算过来就是x=log_2(n),那么时间复杂度就是O(logn)

常见时间复杂度:

image

时间复杂度所耗费时间对比:

image

对于一个算法,最好的情况是第一项就得到结果,也就是O(1),最坏的情况是要运行到最后一步才得到想要的结果,那么就是O(n)。而时间复杂度,一般都是基于最坏情况的。所以也成为最坏时间复杂度。

判断时间复杂度

所有程序语言基本控制流程都包含这三个:

所以看时间复杂度,就要针对这三种流程控制进行分别判断。 image

空间复杂度

空间是指存储空间

S(n) = O(f(n)),其中f(n)是根据n获得占用存储空间的函数。

solomonxie commented 5 years ago

数组 List (线性表)

参考Wiki:Linked list 参考:大话数据结构

List中,元素是有序的,一对一的衔接的,绝对不能是一对多。每个元素都有固定的index位置。所以元素间是线关系。

List的定义、结构、操作方法: image

List主要有这两种存储结构:

image

顺序存储结构 (顺序表 Sequential List / Array List)

image

image

image

image

链式存储结构的线性表 (单链表 Singly Linked List)

image

每个node节点包括两部分信息:

image

头指针与头节点: image

Linked List的读取,必须从头节点开始找,直到找到所需的内容。 一般读取List的某项元素是,是需要指定一个index序号i,然后从j=0开始遍历node节点,直到j=i位置,就读取当前node的data部分。

静态链表 Static Linked List

就是用两个数组,模拟一般单链表,一个数组放数据,一个数组放指针。 这种东西一般用来解决某些编程语言没有指针的问题。

循环链表 Circular Linked List

image

由于单链表到了尾节点的指针就指向了null,且每个节点都不知道自己的上一个节点,所以出现一个很大问题就是无法从某节点开始遍历所有节点,只能向后不能向前。

循环链表就是把尾指针指向了头指针,这样就形成了一个环。即使方向不能改变,但是仍能从任意节点开始环球一圈,遍历到所有节点。

双向链表 Doubly Linked List

双向链表解决了单向链表只能往后不能往前的问题。每个node节点都同时包含"上一个“与"下一个"节点的位置。 image

总结

Python 内置List的操作时间复杂度: image

Python内置Dictionary字典的操作时间复杂度: image

solomonxie commented 5 years ago

栈与队列 Stack & Queue

栈与队列本质上也是线性表的不同存储方式:

栈 Stack

常用操作和线性表List一样。只是在插入和删除时需要执行这两个特殊的操作:

Stack只能每次在栈顶进行操作,删除了栈顶,才能逐次操作下一层的数据。

栈也分为两种不同的存储结构:

链栈是不需要头节点的,因为栈顶自动成为了头部。

栈的应用:递归(Recursion)、多层括号匹配(Nested brackets)、基础四则算术

Basic Arithmetic Operations: 由于四种基本算术层层嵌套,电脑很难识别。所以利用了逆波兰表达法(RPN, Reverse Polish Notation)。 比如将: ((15 ÷ (7 − (1 + 1))) × 3) − (2 + (1 + 1)) 表示为↓: 15 7 1 1 + − ÷ 3 × 2 1 1 + + −

队列 Queue

队列是FIFO,只允许在队列头部删除,在队列尾部插入。

solomonxie commented 5 years ago

二叉树 Binary Tree

二叉树形态:

二叉树存储结构:

所有二叉树的存储,实际上都是以List数组存储,因为计算机只能识别线性数据。

二叉链表: image

二叉树遍历方法:

solomonxie commented 5 years ago

ADT (Abstract Data Type)

"ADT may be defined as a "class of objects whose logical behavior is defined by a set of values and a set of operations"

ADT is actually the pseudo code of the implementation of data structures.

ADT is the gate to understand all Data structures. Because having all the behaviors of an data type in mind, helps a lot to understand what actually the data type is and does.

solomonxie commented 5 years ago

Array (Data structure)

ADT DEFINITION

ADT: <ARRAY>

    It's an array of values with basic data types;
    All values stored in the memory are next to each other.
    It looks like: a1, a2, a3 ...... an.

DATA:
    - values []           # actual data stored in an array 
    - starting_address    # Starting memory address: 0x0023101131
    - maximum_size        # Storage size: like 8 bits, 32 bits
    - length              # How many elements

OPERATIONS:

    __init__(size):
        create memory space for a size for the list.

    getElementAt(index):
        (*) return the value at a certain index.

    getPositionOf(value):
        (*) return the index for the value.

    insertElementAt(value, index):
        (*) insert the value at a certain index.

    deleteElementAt(index):
        (*) delete the element at a certain index.

IMPLEMENTATION

ANALYSIS

solomonxie commented 5 years ago

Linked List (Data structure)

Linked list representation

ADT DEFINITION

Singly Linked List:

ADT: <LINKED LIST> (Singly)

    The Linked list has nothing to do with array [1,2,3,4],
    but has of multiple nodes consist of two parts: value and pointer to the next node.
    That being said, 
    the object "LinkedList" has only two properties: length and a pointer to the head.

DATA:
    - Node         # basic element, consists of two parts
        - value    # actual data
        - next     # pointer of the next node
    - LinkedList
        - length
        - head     # pointer of the head node

OPERATIONS:

    addAt(index):
        (*) insert a node at the index, 

    removeAt(index):
        (*) delete a node at the index, 

    indexOf(value):
        (*) return the index of a node has the value. It searches from head.

    elementAt(index):
        (*) return the value of a node at the index. There's a search starts from head.

Doubly Linked List:

ADT: <LINKED LIST> (Doubly)

    Doubly linked list has a pointer to the "previous node".

DATA:
    - Node         # basic element, consists of two parts
        - value    # actual data
        - prev     # pointer of the previous node
        - next     # pointer of the next node
    - LinkedList
        - length
        - head     # pointer of the head node
        - tail     # pointer of the tail node

Circular Doubly Linked List:

ADT: <LINKED LIST> (Doubly)

    Its properties are the same with Doubly linked list,
    but the operations aren't the same.

DATA:
    - Node         # basic element, consists of two parts
        - value    # actual data
        - prev     # pointer of the previous node
        - next     # pointer of the next node
    - LinkedList
        - length
        - head     # pointer of the head node
        - tail     # pointer of the tail node

IMPLEMENTATION

Refer to: Linked List BY Beau Carnes (Javascript)

ANALYSIS

image

solomonxie commented 5 years ago

Stack (Data structure)

Refer to wiki: Stack (abstract data type)

image

ADT DEFINITION

There're two implementations of Stack: Array and Linked list.

Wikipedia: "What identifies the data structure as a stack in either case is not the implementation but the interface: the user is only allowed to pop or push items onto the array or linked list, with few other helper operations. The following will demonstrate both implementations, using pseudocode."

Version of Array:

ADT: <STACK> (array)

DATA:
    - Stack
        - storage []    # an array of values
        - size    # count of elements
        - top    # pointer of top element

OPERATIONS:

    push(value):
        (*) add node on top of the stack.

    pop():
        (*) delete the top node.

Version of Linked list:

ADT: <STACK> (array)

DATA:
    - Node         # basic element, consists of two parts
        - value    # actual data
        - next     # pointer of the next node
    - Stack
        - size    # counts of all nodes
        - top    # pointer of the top node

OPERATIONS:

    push(value):
        (*) add node on top of the stack.

    pop():
        (*) delete the top node.

IMPLEMENTATION

ANALYSIS

image

solomonxie commented 5 years ago

Queue (Data structure)

Refer to wiki: Queue (abstract data type)

image

ADT DEFINITION

"There are several efficient implementations of FIFO queues. An efficient implementation is one that can perform the operations—enqueuing and dequeuing—in O(1) time."

Linked list implementation of Queue:

ADT: <QUEUE> (linked list)

DATA:
    - Node
        - value
        - next
    - Queue
        - root
        - length

OPERATIONS:

    __init__():

    enqueue(value):
        (*) add the new element to the last

    dequeue():
        (*) delete the first element

Array implementation of Queue:

ADT: <QUEUE> (array)

DATA:
    - Queue
        - elements []
        - head
        - tail

IMPLEMENTATION

ANALYSIS

image

solomonxie commented 5 years ago

Set (Data structure)

image

ADT DEFINITION

ADT: <SET>

    The set data structure stores values without any particular order and with no repeated values.
    Besides being able to add and remove elements to a set, 
    there are a few other important set functions that work with two sets at once.

DATA:
    - Set []     # An array of data without any duplicates.

OPERATIONS:

    union(anotherSet):
        (*) return a new set, with combined values from another set but no duplicates.

    intersection(anotherSet):
        (*) return a new set, with values exist in both sets.

    difference(anotherSet):
        (*) return a new set, with the values does not exist in both sets.

    subset(anotherSet):
        (*) return true, if the current set is a subset of another set.

    add(value):
        add value to the set, if it doesn't exist in the set already.

    remove(value):
        delete the value from set.

    has(value):
        return true if the set contains the value.

IMPLEMENTATION

Refer to: Sets - Beau Carnes (JS)

ANALYSIS

solomonxie commented 5 years ago

Map (Data structure)

A map is sometimes called an associative array or dictionary.

image

ADT DEFINITION

ADT: <MAP>

    A map stores data in key / value pairs where every key is unique. 
    It is often used for fast look-ups of data. 

DATA:
    - Map []
        - key
        - value

OPERATIONS:

    has(key):

    get(key):

    delete(key):

    setValue(key, value):

IMPLEMENTATION

Refer to Beau Carnes: Maps

ANALYSIS

solomonxie commented 5 years ago

Hash Tables (Data structure)

image

ADT DEFINITION

ADT: <HASH TABLE>

    It's actually a MAP data structure, that contains key / value pairs. 
    It uses a <hash function> to compute an index into an array of buckets or slots, 
    from which the desired value can be found.

    A Hash Table is a data structure that uses a hash function to 
    efficiently map keys to values (Table or Map ADT), 
    for efficient search/retrieval, insertion, and/or removals.
    Hash Table is widely used in many kinds of computer software, 
    particularly for associative arrays, database indexing, caches, and sets.

    The hash function should always give the same output number for the same input.

DATA:
    - Keys []
        - key     # a string as the key
    - Map []
        - index   # the pointer to the actual value
        - value

OPERATIONS:

    hash(key):
        (*) return the index of the actual value with respect to the key.

    add(key, value):
        (*) 

    lookup(key):
        (*) 

    remove(key):

IMPLEMENTATION

Refer to Beau Carnes: Hash Table (JS)

ANALYSIS

image

solomonxie commented 5 years ago

Binary Search Tree (Data structure)

image

A tree is a data structure composed of nodes It has the following characteristics:

A binary search tree adds these two characteristics:

ADT DEFINITION

ADT: <BINARY SEARCH TREE>

DATA:
    - Node
        - value
        - left child
        - right child
    - BST
        - root    # pointer to the root node

OPERATIONS:

    search(value):
        (*) return the node which has the value. 
            The comparison descents to left if smaller, to right if greater.

    add(value):

    remove(value):

    findMin():

    findMax():

    isPresent(value):

    findMaxHeight():

    findMinHeight():

    isBalanced():

    inOrder():

    preOrder():

    postOrder():

    levelOrder():

    reverseLevelOrder():

    invert():

IMPLEMENTATION

Refer to Beau Carnes: Binary Search Tree

ANALYSIS

Binary search trees allow fast lookup, addition and removal of items. The way that they are set up means that, on average, each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree.

image

solomonxie commented 5 years ago

Trie (Data structure)

image

It's called the Trie Search Tree, "trie" pronounced as "try".

Trie is specifically designed for searching for a word, rather than a number. Word autocompletion is based on this algorithm.

ADT DEFINITION

ADT: <TRIE>

DATA:
    - Node
        - keys []    # a Map of key/value pairs, for 
        - isWord     # true if you can spell a word from root to this node

OPERATIONS:

IMPLEMENTATION

Refer to Beau Carnes: Trie (JS)

ANALYSIS

solomonxie commented 5 years ago

Binary Heap Tree (Data structure)

image

It's similar to Binary Tree, but different with order. Its goal is to make a complete tree, which in Max Heap the father node is always greater child node, and new node will fill up the same level regardless of the order and then reach to a deeper level.

ADT DEFINITION

ADT: <HEAPS>

DATA:

OPERATIONS:

    insert(value):
        Add the value to the binary tree. 
        If the the level is filled up, then reach to the deeper level.

    remove():

    sort():

IMPLEMENTATION

Refer to Beau Carnes: Heaps (JS)

ANALYSIS

image

solomonxie commented 5 years ago

Graph (Data structure)

Graphs are collections of nodes (also called vertices) and the connections (called edges) between them. Graphs are also known as networks.

image

There are two major types of graphs:

Two common ways to represent a graph are:

ADT DEFINITION

ADT: <GRAPH>

DATA:

OPERATIONS:

IMPLEMENTATION

Refer to Beau Carnes: Graphs (JS)

ANALYSIS

image

solomonxie commented 5 years ago

Tree Traversal 树遍历

树的遍历不用iterate这个词,而是用traversal。 线性列表的遍历简单直接,按顺序一个一个查看即可。但树形结构明显不是线性的,所以不能那么做,而且要复杂的多。 Traversal看起来难懂,实际上和Travel差不多,就是在树的各个节点上“游走“。

General Traversal Methods

树遍历一般分为两大类:

solomonxie commented 5 years ago

Tough questions on [Linked List] [DRAFT]

在双向链表存储结构中,删除p所指的结点时需要修改指针( )

image

将两个各有n个元素的有序表归并成一个有序表,其最多的比较次数是( )

image

在有n个节点的二叉链表中,值为空的链域的个数为()

image

已知表头元素为 c 的单链表在内存中的存储状态如下表所示。

image

若在线性表中采用折半查找法查找元素,该线性表应该:

image 折半查找要求直接访问到中间节点,而链式存储必须遍历整个链表

将两个各有n个元素的有序表归并成一个有序表,其最多的比较次数是()

image 最多的比较次数是当两个有序表的数据刚好是插空顺序的时候,比如:第一个序列是1,3,5,第二个序列是2,4,6,把第二个序列插入到第一个序列中,先把第二个序列中的第一个元素2和第一个序列依次比较,需要比较2次(和1,3比较),第二个元素4需要比较2次(和3,5比较,因为4比2大,2之前的元素都不用比较了),第三个元素6需要比较1次(只和5比较),所以最多需要比较5次。即2n-1次。

solomonxie commented 5 years ago

Tough questions (List) [DRAFT]

一个5*4的矩阵,有多少个长方形?(正方形也算是长方形)

答案: image

solomonxie commented 5 years ago

数据结构概述:《数据结构与算法教程》学习笔记

参考原址:数据结构与算法教程,数据结构C语言版教程!——通俗易懂、深入浅出、中文免费、完整示例、基于C语言

数据结构分类

数据结构分为两大类:

逻辑数据结构大致包含以下几种存储结构:

其中:

线性表示意图:

image

树与图的示意图:

image

复杂度计算

对于一个问题的算法来说,之所以称之为算法,首先它必须能够解决这个问题(称为准确性)。其次,通过这个算法编写的程序要求在任何情况下不能崩溃(称为健壮性)。如果准确性和健壮性都满足,接下来,就要考虑最重要的一点:通过算法编写的程序,运行的效率怎么样。

运行效率体现在两方面:

好算法的标准就是:在符合算法本身的要求的基础上,使用算法编写的程序运行的时间短,运行过程中占用的内存空间少,就可以称这个算法是“好算法”。

常用的时间复杂度的排序

image

时间换空间,用空间换时间

根据自己的资源情况而定。如果相对于数据来说,自己的计算资源较足那么就尽量减少算法的内存空间利用,而集中到CPU上;如果自己的内存资源较足,那么就减少CPU计算而尽量利用内存来存放缓存数据。

solomonxie commented 5 years ago

Hash Table 哈希表详解 [DRAFT]

参考ToolsQA:Hash Tables in Data Structures 参考:[译]C语言实现一个简易的Hash table(1) 参考:Write a hash table in C