llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.59k stars 11.82k forks source link

LoopVectorize/VPlan asserts "underlying instruction may write to memory" with a loop with parallel loop MD and a volatile load #107854

Open pjaaskel opened 1 month ago

pjaaskel commented 1 month ago

I cannot reproduce this one from C/C++, but only via PoCL-generated work-group functions which can sometimes be a bit ... involved. Attached is a reproducer .ll, which produces the crash. It originates from an OpenCL C kernel which has a volatile int as the loop iteration variable, which PoCL (currently, this is a WiP to clean up) converts to per-WI variables. It somehow sneaks the loop down to the assert point and then fails because the load is a volatile. Minimized test case below.

Tested with 19.1.0-rc4.


opt --passes=loop-vectorize vplan-crash.ll -S -o -  
opt: /home/pjaaskel/src/chipStar/llvm-project/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp:80: bool llvm::VPRecipeBase::mayWriteToMemory() const: Assertion `(!I || !I->mayWriteToMemory()) && "underlying instruction may write to memory"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
...
``
pjaaskel commented 1 month ago

Bugpoint minimized this to

define void @_pocl_kernel_test_kernel() local_unnamed_addr #1 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_typ$
pregion_for_entry.entry.peeled_wi.i.preheader:
  br label %pregion_for_init10.i

pregion_for_init10.i:                             ; preds = %pregion_for_entry.for.body.prebarrier.postbarrier.i.preheader, %pregi$
  br label %pregion_for_entry.for.body.prebarrier.prebarrier.i

pregion_for_entry.for.body.prebarrier.prebarrier.i: ; preds = %pregion_for_entry.for.body.prebarrier.prebarrier.i, %pregion_for_in$
  %_local_id_x.4 = phi i64 [ 0, %pregion_for_init10.i ], [ %1, %pregion_for_entry.for.body.prebarrier.prebarrier.i ]
  %0 = getelementptr [1 x [1 x [4 x i32]]], ptr poison, i64 0, i64 0, i64 0, i64 %_local_id_x.4
  %i.0.i.0.i.0.i.0.i.0.i.0.i.0.i.0.21.i = load volatile i32, ptr %0, align 4, !tbaa !6, !llvm.access.group !10
  %1 = add nuw nsw i64 %_local_id_x.4, 1
  %exitcond3.not = icmp eq i64 %1, 4
  br i1 %exitcond3.not, label %pregion_for_entry.for.body.prebarrier.postbarrier.i.preheader, label %pregion_for_entry.for.body.pr$

pregion_for_entry.for.body.prebarrier.postbarrier.i.preheader: ; preds = %pregion_for_entry.for.body.prebarrier.prebarrier.i
  br label %pregion_for_init10.i, !llvm.loop !14
}

There not much special here except for the parallel loop metadata. When I remove the parallel loop metadata (!llvm.access.group !10) it doesn't crash nor vectorize (expected due to the volatile load).

bugpoint-reduced-simplified.bc.gz