Open katsusan opened 3 years ago
https://preshing.com/20120913/acquire-and-release-semantics/ http://blog.forecode.com/2010/01/29/barriers-to-understanding-memory-barriers/
Types of Memory Barrier:LoadLoad, LoadStore, StoreLoad, StoreStore.
Acquire semantics is a property that can only apply to operations that read from shared memory, whether they are read-modify-write operations or plain loads. The operation is then considered a read-acquire. Acquire semantics prevent memory reordering of the read-acquire with any read or write operation that follows it in program order.
LoadLoad+LoadStore prevents loads from moving down, and anything from moving up. Acquire语义可以用LoadLoad和LoadStore组合来实现。
Release semantics is a property that can only apply to operations that write to shared memory, whether they are read-modify-write operations or plain stores. The operation is then considered a write-release. Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order.
StoreStore+LoadStore prevents store from moving up, and anything from moving down. Release语义可以用LoadStore和StoreStore来实现。
note1: // why x86/amd64 are called strong memory model?
note 2: // 64-ia-32-architectures-software-developer-manual // 8.2.2 Memory Ordering in P6 and More Recent Processor Families
note 3:
P1 | P2 |
---|---|
mov [_x], 1mov [_y], 1 | mov r1, [_y]mov r2, [_x] |
初始化_x和_y都为0,这种情况下r1=1&r2=0不会发生。如果要发生需要P1的两个store或者P2的两个load发生reorder,
违反了前面所述的memory model。
P1 | P2 |
---|---|
mov r1, [_x]mov [_y], 1 | mov r2, [_y]mov [_x], 1 |
初始状态下_x和_y处均为0,则r1=1且r2=1的情况不会发生,原因:
mov[_x],1
已经执行,mov r2, [_y]
发生于mov [_x], 1
之前,mov r1, [_x]
发生于mov [_y], 1
之前,mov r2, [_y]
发生于P1的mov [_y], 1
之前,即r2不可能为1。P1 | P2 |
---|---|
mov [ _x], 1mov r1, [ _y] | mov [ _y], 1mov r2, [ _x] |
初始_x和_y处均为0,则最后r1=0且r2=0的情况是允许的,对于每个线程,两次指令操作的地址不同, 因此可能会将其重排序使得赋值给_x和_y前就取其值到寄存器r1/r2中。
而若执行的指令变为mov [_x], 1
+mov r1, [_x]
,则初始_x=0的情况下r1不可能为0,因其涉及到了同样的地址_x。
P1 | P2 |
---|---|
mov [_x], 1mov r1, [_x]mov r2, [_y] | mov [_y], 1mov r3, [_y]mov r4, [_x] |
初始_x和_y处均为0,则r2=0且r4=0的情况有可能发生。处理器对每个Processor的store顺序不做约束(imposes no constraints)。
P1 | P2 | P3 |
---|---|---|
mov [_x], 1 | mov r1, [_x]mov [_y], 1 | mov r2, [_y]mov r3, [_x] |
初始_x和_y均为0,则r1=1&r2=1&r3=0的情况不会发生。
Stores Are Seen in a Consistent Order by Other Processors As noted in Section 8.2.3.5, the memory-ordering model allows stores by two processors to be seen in different orders by those two processors. However, any two stores must appear to execute in the same order to all processors other than those performing the stores. 按照规则4的描述,x86允许两个processor的store被双方观测到不同的顺序(?TODO)。 但任意两次store必须被其它processor观测到同样的执行顺序。
Locked Instructions Have a Total Order The memory-ordering model ensures that all processors agree on a single execution order of all locked instruc- tions, including those that are larger than 8 bytes or are not naturally aligned.
1. summary
When instructions refer to different memory locations, with regards to loads/reads from memory and stores/writes to memory there are 4 possible issues:
x86架构下只有最后一种情况会发生,我们称之为strong memory model。 举个例子,假设x和y初始值均为0,有两个线程分别执行下面每组指令:
在x86下eax和ebx可能的状态中有均为0的组合,按照常规逻辑由于读取x和y时至少有一组是赋值之后执行的, 所以必定发生了reorder。每组线程写和读的地址又不一样,所以cpu难以判定dependency关系而决定不重排。
而像ARM、PowerPC它们上述4种情况都可能会发生,这种称之为weak memory model。
2. TSO model
Another way to be able to predict the x86 processor behaviour is to have a model where each hardware thread has a FIFO buffer for writes, while reads are not immediate, not buffered. When reading a memory location, it is first looked up in the FIFO buffer for writes.
x86下每个hardware thread有一个用于store的FIFO buffer,但读不会有缓冲,读的时候会先查询store buffer。 这种称之为 total store ordering。 对于互斥的内存访问时需要设置一个全局锁global lock,比如lock指令flush write buffer到内存。
refer: http://bajamircea.github.io/coding/cpp/2019/10/25/cpu-memory-model.html