Open jxl1080 opened 4 months ago
Can confirm this is due to the feature flag avx512
.
❯ DYLD_LIBRARY_PATH=../../../distrib/lib ./Convolve -g Convolve -f Convolve input.type=float64 kernel.type=float64 output.type=float64 target=host-avx512-no_runtime-no_bounds_query -o ./
LLVM ERROR: Cannot select: 0x7fd8ca04c2d0: ch = masked_store<(store unknown-size into %ir.lsr.iv21, align 8, !tbaa !51)> 0x7fd8ca041890, 0x7fd8ca046910, 0x7fd8ca045a60, undef:i64, 0x7fd8ca04a810
0x7fd8ca046910: v4f64,ch = load<(dereferenceable load (s256) from %ir.sum38, align 64, !tbaa !35)> 0x7fd8ca046400, FrameIndex:i64<0>, undef:i64
0x7fd8ca046240: i64 = FrameIndex<0>
0x7fd8ca04c8f0: i64 = undef
0x7fd8ca045a60: i64,ch = CopyFromReg 0x7fd8c9909f60, Register:i64 %89
0x7fd8ca0461d0: i64 = Register %89
0x7fd8ca04c8f0: i64 = undef
0x7fd8ca04a810: v4i1 = setcc 0x7fd8ca04a180, 0x7fd8ca04bfc0, setle:ch
0x7fd8ca04a180: v4i32 = extract_subvector 0x7fd8ca0a5b70, Constant:i64<0>
0x7fd8ca0a5b70: v8i32,ch = CopyFromReg 0x7fd8c9909f60, Register:v8i32 %44
0x7fd8ca045b40: v8i32 = Register %44
0x7fd8ca04be70: i64 = Constant<0>
0x7fd8ca04bfc0: v4i32 = extract_subvector 0x7fd8ca046f30, Constant:i64<0>
0x7fd8ca046f30: v8i32,ch = CopyFromReg 0x7fd8c9909f60, Register:v8i32 %45
0x7fd8ca0ac720: v8i32 = Register %45
0x7fd8ca04be70: i64 = Constant<0>
In function: Convolve
Pipeline compiles fine without avx512. @jxl1080 I updated your generator to this:
class Convolve : public Halide::Generator<Convolve> {
public:
// We declare the Inputs to the Halide pipeline as public
// member variables. They'll appear in the signature of our generated
// function in the same order as we declare them.
Input<Buffer<>> input{"input", 2};
Input<Buffer<>> kernel{ "kernel", 1 };
Input<uint32_t> outputDim{"inputLen"};
Output<Buffer<>> output{ "output", 2 };
private:
Var x{"x"},c{"c"};
Expr filterLen;
public:
// We then define a method that constructs and return the Halide
// algorithm pipeline:
void generate() {
filterLen = kernel.dim(0).extent();
Halide::RDom rk(0, filterLen);
output(x,c) = Halide::sum(kernel(rk.x) * input(x + rk.x,c));
}
// scheduling pipeline:
void schedule() {
Expr vectorSize = natural_vector_size(output.type());
output.vectorize(x, vectorSize, TailStrategy::GuardWithIf);
}
};
HALIDE_REGISTER_GENERATOR(Convolve, Convolve)
I tried mcourteaux's modified generator class, it still failed with avx512. Thus a fix for this bug is still needed.
LLVM ERROR: Cannot select: t37: ch = masked_store<(store unknown-size into %ir.sum15, align 64, !tbaa !45)> t0, t28, FrameIndex:i64<0>, undef:i64, t35
This may well be a bug in LLVM 18 (rather than Halide itself). Can you try with top-of-tree LLVM + top-of-tree Halide and see if it still repros?
I tried mcourteaux's modified generator class, it still failed with avx512. Thus a fix for this bug is still needed.
I was just trying to give some feedback. Was by no means meant as a fix. Was showing you that you can access buffer extents: you don't have to explicitly pass them as extra arguments.
I got the error below when using Halide-18.0.0-x86-64-windows-41bc134ae9a8fa32d968867ac1aeeac6f63a142e, which I downloaded from https://buildbot.halide-lang.org/:
LLVM ERROR: Cannot select: t37: ch = masked_store<(store unknown-size into %ir.sum15, align 64, !tbaa !45)> t0, t28, FrameIndex:i64<0>, undef:i64, t35 t28: v4f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t12: i64 = FrameIndex<0> t15: i64 = undef t35: v4i1 = setcc t30, t33, setle:ch t30: v4i32 = extract_subvector t2, Constant:i64<0> t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %23 t1: v8i32 = Register %23 t29: i64 = Constant<0> t33: v4i32 = extract_subvector t4, Constant:i64<0> t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %24 t3: v8i32 = Register %24 t29: i64 = Constant<0> In function: Convolve
My Halide Generator class is attached: myHalideGenerator.txt
My command to run my Halide Generator class is: myHalideGenerator.exe -g Convolve -f Convolve input.type=float64 kernel.type=float64 output.type=float64 target=x86-64-windows-large_buffers-enable_llvm_loop_opt-avx512-avx2-avx-sse41-no_runtime-no_asserts -o ./
I found this error also happens with x86-64-osx package.