Miscompile of atomic rmw or on x86 (not on aarch, though)

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.37k stars 11.72k forks source link

Miscompile of atomic rmw or on x86 (not on aarch, though) #69508

Open lovely-error opened 11 months ago

lovely-error commented 11 months ago

Recently I wrote one little piece of code that gave me too much headache. After investigation, I discovered that on x86 the sequence of instructions that llvm (strangely, GCC as well) produces is invalid because after applying optimisations, atomicity of load gets elided. You can witness that llvm makes up bunch of nonsense here. There you can find c code and if you compile it with recent llvm to aarch, atomic_fetch_or_explicit gets translated to a loop of a pair of special load-store instructions which is correct lowering, but if you do it for x86, you can in fact witness that generated code does not contain lock or ... instruction, which would be correct code, but instead lock cmpxchg ... which is invalid.

lovely-error commented 11 months ago

Just realised https://godbolt.org/z/GxdvMdP76 is the code I intended to share 🫢

nikic commented 11 months ago

In what way is the generated code "invalid"? A cmpxchg loop is a valid lowering for atomicrmw. The reason you get it instead of lock or is that you request the old value (prior to the or) to be returned, which lock or does not provide.

llvmbot commented 11 months ago

@llvm/issue-subscribers-backend-x86

Author: Enigma Lav (lovely-error)

Recently I wrote one little piece of code that gave me too much headache. After investigation, I discovered that on x86 the sequence of instructions that llvm (strangely, GCC as well) produces is invalid because after applying optimisations, atomicity of load gets elided. You can witness that llvm makes up bunch of nonsense [here](https://godbolt.org/z/GxdvMdP76). There you can find c code and if you compile it with recent llvm to aarch, `atomic_fetch_or_explicit` gets translated to a loop of a pair of special load-store instructions which is correct lowering, but if you do it for x86, you can in fact witness that generated code does not contain `lock or ...` instruction, which would be correct code, but instead `lock cmpxchg ...` which is invalid.