Open katsusan opened 3 years ago
[1] Instruction Set Architectures and Compilers
An Instruction Set Architecture (ISA) is an agreement about how software will communicate with the processor. A common scenario in an ISA has the following features:
A description of the ISA for a processor answers the following questions:
What are some things not specified by the ISA?
Types of Instruction Sets
There are three main types of instruction sets:
前两者目前均成为了历史,JVM从高层次抽象角度来讲可以视作stack-based ISA; x86是基于古老的有Accumulator特征的指令集,可以将它分类为 special-purpose register machine 。
general-purpose register(GPR)架构又主要分为两种:
所谓的Reduced Instruction Set Computing(RISC)架构属于load/store architectures,比如SPARC, MIPS, Alpha AXP, PowerPC。 Complex Instruction Computing(CISC)架构通常是register-memory architectures,比如 VAX, x86, MC68000。
Memory Addressing
大多数计算机系统把内存按字节(8bit)划分,ISA决定如何把这些字节组合成更大的结构(比如32bit整数), 涉及到的有关方面如下:
Operand Types
An operand is a value that an instruction operates on. 给定指令类型和寻址模式我们就可以指定指令的操作数。 相关的操作数有:
Types of Instructions
Instruction Encoding How are instruction types, operands, addressing modes, etc. communicated to the hardware? The ISA specifies a binary encoding of instructions.
The assembler encodes programs using this encoding, and the microarchitecture reads and executes the encoded program.
以MIPS指令集为例:
一共有三种类型的指令:
___________________________________________________________________________
|_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|________16-bit_immediate______________|
___________________________________________________________________________
|_6-bit opcode_|_5-bit_rs_|_5-bit_rt_|_5-bit_rd_|_5-bit_shamt_|_6-bit_funct_|
___________________________________________________________________________
|_6-bit opcode_|_____________26-bit offset added to PC______________________|
Compilers Compiler把高级程序代码翻译成机器指令,通常包含以下部分:
[2] Pipeline and ILP
Pipelined CPUs
let's consider a five-stage pipeline for a RISCV microprocessor.
Clock Frequency
Pipelining Increases the Clock Frequency
Imagine the circuitry of a simple processor. It must have gates that do all five stages of the five-stage pipeline I mentioned, even if it isn't pipelined. The clock signal must flow from the beginning of the circuit through the maximum-depth path of the circuit before the clock can tick again. If we divide the circuitry into five independent and balanced stages, the length of this path is divided by five, so the clock frequency can be multiplied by five. If we can find a way to divide the work into ten stages, then the clock can be multiplied by ten. This is the way it works ideally; in practice improvement is more modest. Some barriers to this "perfect" clock improvement are:
// 对于非pipelined处理器,电流每个周期必须通过每个stage,即使该周期某stage并未工作,故pipeline粒度越高时钟频率可提升的越高
Why Pipelining Works: Instruction-Level Parallelism
Pipelines work because many instructions executing sequentially are doing independent tasks that can be done in parallel.
This kind of parallelism is called instruction-level parallelism (ILP). In the simple pipeline we have seen, ILP can be hard to come by; however, there are many tricks people have invented for squeezing more ILP out of the instruction stream, like instruction reordering and speculation.
Obstacles to Pipelining: Hazards
We have seen a few physical limitations to pipelining. However, the three main difficulties with pipelining have to do with the nature of the instruction stream being executed. These hazards can prevent a pipeline stage from correctly carrying out its purpose.
<i>: add r1, r2, r3 // r1 := r2 + r3
<i+1>: add r4, r1, r5 // r4 := r1 + r5
There are many techniques for solving this problem. Forwarding (or bypass) is the main technique.Dependences
As an introduction to data hazards, we will see the different ways that instructions can be dependent on one another.
Note that dependences are a property of programs. Not all dependences will affect the pipeline; we are really interested in dependence in a small window of instructions for pipelines.
However, the compiler uses dependence information over a much larger region to produce more efficient code.
<i>: add r1, r2, r3 // r1 := r2 + r3
<i+1>: add r3, r4, r5 // r3 := r1 + r4
// 第二条指令写r3而第一条读r3, 因此处理器必须保证在第二条执行之前第一条指令读取到正确的值.When these dependences occur in such a way that they are exposed to the pipeline, three different types of data hazards may occur:
Solutions to Data Hazards
http://hpca23.cse.tamu.edu/taco/utsa-www/cs5513-fall07/schedule.html → program from software to hardware
appendix:
http://www.lighterra.com/papers/modernmicroprocessors/