llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.37k stars 12.14k forks source link

[AMDGPU] Different code generated for SelectionDAG and GlobaIISel #112524

Open tgymnich opened 1 month ago

tgymnich commented 1 month ago

This example from the test suite (llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll) seems to compile to different assembly for SelectionDAG and GlobalISel:

define amdgpu_ps half @lshr_i16_vs(i16 %value, i16 inreg %amount) {
  %result = lshr i16 %value, %amount
  %cast = bitcast i16 %result to half
  ret half %cast
}

llc -mtriple=amdgcn-amd-amdpal -mcpu=tahiti

GlobalISel:

lshr_i16_vs:                            ; @lshr_i16_vs
        s_and_b32 s0, s0, 0xffff
        v_and_b32_e32 v0, 0xffff, v0
        v_lshrrev_b32_e32 v0, s0, v0

SelectionDAG:

lshr_i16_vs:                            ; @lshr_i16_vs
        v_and_b32_e32 v0, 0xffff, v0
        v_lshrrev_b32_e32 v0, s0, v0
        v_cvt_f32_f16_e32 v0, v0

One seems to perform a fpext at the end the other performs an anyext/bitcast. This difference is not present for gfx940 for example.

How is the amdgpu_ps calling convention defined?

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-amdgpu

Author: Tim Gymnich (tgymnich)

This example from the test suite (`llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll`) seems to compile to different assembly for SelectionDAG and GlobalISel: ```llvm define amdgpu_ps half @lshr_i16_vs(i16 %value, i16 inreg %amount) { %result = lshr i16 %value, %amount %cast = bitcast i16 %result to half ret half %cast } ``` `llc -mtriple=amdgcn-amd-amdpal -mcpu=tahiti` [GlobalISel](https://godbolt.org/z/MrG8o1h8P): ``` lshr_i16_vs: ; @lshr_i16_vs s_and_b32 s0, s0, 0xffff v_and_b32_e32 v0, 0xffff, v0 v_lshrrev_b32_e32 v0, s0, v0 ``` [SelectionDAG](https://godbolt.org/z/MPrvGTc9K): ``` lshr_i16_vs: ; @lshr_i16_vs v_and_b32_e32 v0, 0xffff, v0 v_lshrrev_b32_e32 v0, s0, v0 v_cvt_f32_f16_e32 v0, v0 ``` One seems to perform a fpext at the end the other performs an anyext/bitcast. This difference is not present for gfx940 for example. How is the `amdgpu_ps` calling convention defined?
arsenm commented 1 month ago

The gfx6/gfx7 ABI is broken for half values. It should work correctly on targets with legal half.

I think the GlobalISel behavior of passing as the low 16-bits makes more sense than the float promotion SelectionDAG does. Ideally we would fix SelectionDAG to stop the promotion, but it insists on using legal types. It's been many years since I last tried to fix that.