Memory intrinsic lowering should try to simplify based on size bounds

arsenm commented 1 year ago

The testcase from #63986 was triggered by poor optimization of the size bounds.

The copy size has a known bound like this: %spec.select = tail call i64 @llvm.umin.i64(i64 sub (i64 ptrtoint (ptr addrspacecast (ptr addrspace(4) inttoptr (i64 32 to ptr addrspace(4)) to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(4) null to ptr) to i64)), i64 56)

After PreISelIntrinsicLowering, the size check looks like this:

  %1 = udiv i64 %spec.select, 16
  %2 = urem i64 %spec.select, 16
  %3 = sub i64 %spec.select, %2
  %4 = icmp ne i64 %1, 0

InstCombine could clean this up nicely, but this runs really late so that never happens. The backend ends up seeing a non-constant folded and instruction. We should probably try to apply some simplifications based on the known copy size range, even if it's unknown.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-amdgpu

Pierre-vh commented 9 months ago

@arsenm I checked why we don't fold the and. In theory we could because we lower ADDRSPACECAST in the DAG to some simpler ops, but in that example it doesn't happen because the constants resulting from the folding of addrspacecast are in a separate block from the other operations.

We really need something at the IR level to help here. I'm wondering how we can do this. Maybe a TTI hook in some pass like InstCombine to do the constant folding? Or we could have a dedicated pass to early lower addrspacecast and duplicate the DAG's logic there but at the IR level (seems like a hack though)

arsenm commented 9 months ago

Probably the only useful case for constant folding addrspacecast would be for null, which would be solved if we had null values in the datalayout.

llvm / llvm-project

Memory intrinsic lowering should try to simplify based on size bounds #64620