katsusan / gowiki

0 stars 0 forks source link

Computer Architecture #17

Open katsusan opened 3 years ago

katsusan commented 3 years ago

http://hpca23.cse.tamu.edu/taco/utsa-www/cs5513-fall07/schedule.html → program from software to hardware

appendix:
http://www.lighterra.com/papers/modernmicroprocessors/

katsusan commented 3 years ago

[1] Instruction Set Architectures and Compilers

An Instruction Set Architecture (ISA) is an agreement about how software will communicate with the processor. A common scenario in an ISA has the following features:

A description of the ISA for a processor answers the following questions:

What are some things not specified by the ISA?

Types of Instruction Sets

There are three main types of instruction sets:

前两者目前均成为了历史,JVM从高层次抽象角度来讲可以视作stack-based ISA; x86是基于古老的有Accumulator特征的指令集,可以将它分类为 special-purpose register machine

general-purpose register(GPR)架构又主要分为两种:

所谓的Reduced Instruction Set Computing(RISC)架构属于load/store architectures,比如SPARC, MIPS, Alpha AXP, PowerPC。 Complex Instruction Computing(CISC)架构通常是register-memory architectures,比如 VAX, x86, MC68000。

Memory Addressing

大多数计算机系统把内存按字节(8bit)划分,ISA决定如何把这些字节组合成更大的结构(比如32bit整数), 涉及到的有关方面如下:

Operand Types

An operand is a value that an instruction operates on. 给定指令类型和寻址模式我们就可以指定指令的操作数。 相关的操作数有:

Types of Instructions

Instruction Encoding How are instruction types, operands, addressing modes, etc. communicated to the hardware? The ISA specifies a binary encoding of instructions.

The assembler encodes programs using this encoding, and the microarchitecture reads and executes the encoded program.

以MIPS指令集为例:

一共有三种类型的指令:

Compilers Compiler把高级程序代码翻译成机器指令,通常包含以下部分:

katsusan commented 2 years ago

[2] Pipeline and ILP

Pipelined CPUs

let's consider a five-stage pipeline for a RISCV microprocessor.

Clock Frequency

Pipelining Increases the Clock Frequency

Imagine the circuitry of a simple processor. It must have gates that do all five stages of the five-stage pipeline I mentioned, even if it isn't pipelined. The clock signal must flow from the beginning of the circuit through the maximum-depth path of the circuit before the clock can tick again. If we divide the circuitry into five independent and balanced stages, the length of this path is divided by five, so the clock frequency can be multiplied by five. If we can find a way to divide the work into ten stages, then the clock can be multiplied by ten. This is the way it works ideally; in practice improvement is more modest. Some barriers to this "perfect" clock improvement are:
// 对于非pipelined处理器,电流每个周期必须通过每个stage,即使该周期某stage并未工作,故pipeline粒度越高时钟频率可提升的越高

Why Pipelining Works: Instruction-Level Parallelism

Pipelines work because many instructions executing sequentially are doing independent tasks that can be done in parallel.

This kind of parallelism is called instruction-level parallelism (ILP). In the simple pipeline we have seen, ILP can be hard to come by; however, there are many tricks people have invented for squeezing more ILP out of the instruction stream, like instruction reordering and speculation.

Obstacles to Pipelining: Hazards

We have seen a few physical limitations to pipelining. However, the three main difficulties with pipelining have to do with the nature of the instruction stream being executed. These hazards can prevent a pipeline stage from correctly carrying out its purpose.

Dependences

As an introduction to data hazards, we will see the different ways that instructions can be dependent on one another.

Note that dependences are a property of programs. Not all dependences will affect the pipeline; we are really interested in dependence in a small window of instructions for pipelines.

However, the compiler uses dependence information over a much larger region to produce more efficient code.

When these dependences occur in such a way that they are exposed to the pipeline, three different types of data hazards may occur:

Solutions to Data Hazards