Open pjaaskel opened 1 month ago
Bugpoint minimized this to
define void @_pocl_kernel_test_kernel() local_unnamed_addr #1 !kernel_arg_addr_space !1 !kernel_arg_access_qual !2 !kernel_arg_typ$
pregion_for_entry.entry.peeled_wi.i.preheader:
br label %pregion_for_init10.i
pregion_for_init10.i: ; preds = %pregion_for_entry.for.body.prebarrier.postbarrier.i.preheader, %pregi$
br label %pregion_for_entry.for.body.prebarrier.prebarrier.i
pregion_for_entry.for.body.prebarrier.prebarrier.i: ; preds = %pregion_for_entry.for.body.prebarrier.prebarrier.i, %pregion_for_in$
%_local_id_x.4 = phi i64 [ 0, %pregion_for_init10.i ], [ %1, %pregion_for_entry.for.body.prebarrier.prebarrier.i ]
%0 = getelementptr [1 x [1 x [4 x i32]]], ptr poison, i64 0, i64 0, i64 0, i64 %_local_id_x.4
%i.0.i.0.i.0.i.0.i.0.i.0.i.0.i.0.21.i = load volatile i32, ptr %0, align 4, !tbaa !6, !llvm.access.group !10
%1 = add nuw nsw i64 %_local_id_x.4, 1
%exitcond3.not = icmp eq i64 %1, 4
br i1 %exitcond3.not, label %pregion_for_entry.for.body.prebarrier.postbarrier.i.preheader, label %pregion_for_entry.for.body.pr$
pregion_for_entry.for.body.prebarrier.postbarrier.i.preheader: ; preds = %pregion_for_entry.for.body.prebarrier.prebarrier.i
br label %pregion_for_init10.i, !llvm.loop !14
}
There not much special here except for the parallel loop metadata. When I remove the parallel loop metadata (!llvm.access.group !10) it doesn't crash nor vectorize (expected due to the volatile load).
I cannot reproduce this one from C/C++, but only via PoCL-generated work-group functions which can sometimes be a bit ... involved. Attached is a reproducer .ll, which produces the crash. It originates from an OpenCL C kernel which has a volatile int as the loop iteration variable, which PoCL (currently, this is a WiP to clean up) converts to per-WI variables. It somehow sneaks the loop down to the assert point and then fails because the load is a volatile. Minimized test case below.
Tested with 19.1.0-rc4.