halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.91k stars 1.07k forks source link

LLVM ERROR from Halide 18.0.0 #8341

Open jxl1080 opened 4 months ago

jxl1080 commented 4 months ago

I got the error below when using Halide-18.0.0-x86-64-windows-41bc134ae9a8fa32d968867ac1aeeac6f63a142e, which I downloaded from https://buildbot.halide-lang.org/:

LLVM ERROR: Cannot select: t37: ch = masked_store<(store unknown-size into %ir.sum15, align 64, !tbaa !45)> t0, t28, FrameIndex:i64<0>, undef:i64, t35 t28: v4f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00>, ConstantFP:f64<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t13: f64 = ConstantFP<0.000000e+00> t12: i64 = FrameIndex<0> t15: i64 = undef t35: v4i1 = setcc t30, t33, setle:ch t30: v4i32 = extract_subvector t2, Constant:i64<0> t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %23 t1: v8i32 = Register %23 t29: i64 = Constant<0> t33: v4i32 = extract_subvector t4, Constant:i64<0> t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %24 t3: v8i32 = Register %24 t29: i64 = Constant<0> In function: Convolve

My Halide Generator class is attached: myHalideGenerator.txt

My command to run my Halide Generator class is: myHalideGenerator.exe -g Convolve -f Convolve input.type=float64 kernel.type=float64 output.type=float64 target=x86-64-windows-large_buffers-enable_llvm_loop_opt-avx512-avx2-avx-sse41-no_runtime-no_asserts -o ./

I found this error also happens with x86-64-osx package.

mcourteaux commented 4 months ago

Can confirm this is due to the feature flag avx512.

❯ DYLD_LIBRARY_PATH=../../../distrib/lib ./Convolve -g Convolve -f Convolve input.type=float64 kernel.type=float64 output.type=float64  target=host-avx512-no_runtime-no_bounds_query -o ./
LLVM ERROR: Cannot select: 0x7fd8ca04c2d0: ch = masked_store<(store unknown-size into %ir.lsr.iv21, align 8, !tbaa !51)> 0x7fd8ca041890, 0x7fd8ca046910, 0x7fd8ca045a60, undef:i64, 0x7fd8ca04a810
  0x7fd8ca046910: v4f64,ch = load<(dereferenceable load (s256) from %ir.sum38, align 64, !tbaa !35)> 0x7fd8ca046400, FrameIndex:i64<0>, undef:i64
    0x7fd8ca046240: i64 = FrameIndex<0>
    0x7fd8ca04c8f0: i64 = undef
  0x7fd8ca045a60: i64,ch = CopyFromReg 0x7fd8c9909f60, Register:i64 %89
    0x7fd8ca0461d0: i64 = Register %89
  0x7fd8ca04c8f0: i64 = undef
  0x7fd8ca04a810: v4i1 = setcc 0x7fd8ca04a180, 0x7fd8ca04bfc0, setle:ch
    0x7fd8ca04a180: v4i32 = extract_subvector 0x7fd8ca0a5b70, Constant:i64<0>
      0x7fd8ca0a5b70: v8i32,ch = CopyFromReg 0x7fd8c9909f60, Register:v8i32 %44
        0x7fd8ca045b40: v8i32 = Register %44
      0x7fd8ca04be70: i64 = Constant<0>
    0x7fd8ca04bfc0: v4i32 = extract_subvector 0x7fd8ca046f30, Constant:i64<0>
      0x7fd8ca046f30: v8i32,ch = CopyFromReg 0x7fd8c9909f60, Register:v8i32 %45
        0x7fd8ca0ac720: v8i32 = Register %45
      0x7fd8ca04be70: i64 = Constant<0>
In function: Convolve

Pipeline compiles fine without avx512. @jxl1080 I updated your generator to this:

class Convolve : public Halide::Generator<Convolve> {
public:
    // We declare the Inputs to the Halide pipeline as public
    // member variables. They'll appear in the signature of our generated
    // function in the same order as we declare them.

    Input<Buffer<>> input{"input", 2};
    Input<Buffer<>> kernel{ "kernel", 1 };
    Input<uint32_t> outputDim{"inputLen"};

    Output<Buffer<>> output{ "output", 2 };

private:
    Var x{"x"},c{"c"};
    Expr filterLen;
public:
    // We then define a method that constructs and return the Halide
    // algorithm pipeline:
    void generate() {
        filterLen = kernel.dim(0).extent();
        Halide::RDom rk(0, filterLen);
        output(x,c) = Halide::sum(kernel(rk.x) * input(x + rk.x,c));
    }
    // scheduling pipeline:
    void schedule() {
        Expr vectorSize = natural_vector_size(output.type());
        output.vectorize(x, vectorSize, TailStrategy::GuardWithIf);
    }
};
HALIDE_REGISTER_GENERATOR(Convolve, Convolve)
jxl1080 commented 4 months ago

I tried mcourteaux's modified generator class, it still failed with avx512. Thus a fix for this bug is still needed.

steven-johnson commented 4 months ago

LLVM ERROR: Cannot select: t37: ch = masked_store<(store unknown-size into %ir.sum15, align 64, !tbaa !45)> t0, t28, FrameIndex:i64<0>, undef:i64, t35

This may well be a bug in LLVM 18 (rather than Halide itself). Can you try with top-of-tree LLVM + top-of-tree Halide and see if it still repros?

mcourteaux commented 4 months ago

I tried mcourteaux's modified generator class, it still failed with avx512. Thus a fix for this bug is still needed.

I was just trying to give some feedback. Was by no means meant as a fix. Was showing you that you can access buffer extents: you don't have to explicitly pass them as extra arguments.