jrprice / Oclgrind

An OpenCL device simulator and debugger
Other
346 stars 61 forks source link

False positive "uninitalized memory" with primitive vector types #126

Open jerryct opened 7 years ago

jerryct commented 7 years ago

Hi,

attached you will find a minimal reproducer. My kernels are using float8 and I get these false positives. It does not only happen with float8 but also with float2 (I have not tested with float4). I stripped down the kernels to the bare minimum - thus it does not do anything usefull.

Until now I don't understand the reason after looking at the code. I like the program so I would appreciated help.

jerry

Reproducer:

Save as simd8.cl ### kernel void Simd8(global float8 const restrict data) { const int u = get_global_id(0); const int v = get_global_id(1); const int index = get_global_size(0) v + u;

float8 cell_values = (float8)(index);

data[index].s0 = cell_values.s0;
data[index].s1 = cell_values.s1;
data[index].s2 = cell_values.s2;
data[index].s3 = cell_values.s3;
data[index].s4 = cell_values.s4;
data[index].s5 = cell_values.s5;
data[index].s6 = cell_values.s6;
data[index].s7 = cell_values.s7;

} ###

Save as simd8.sim ### simd8.cl Simd8

1 8 1 1 1 1

***###*** ***Run with***: ./oclgrind-kernel --uninitialized simd8.sim ***Output*** Uninitialized value written to global memory address 0x1000000000000 Kernel: Simd8 Entity: Global(0,0,0) Local(0,0,0) Group(0,0,0) store <8 x float> undef, <8 x float> addrspace(1)* %arrayidx, align 32, !dbg !32 At line 16 of input.cl: data[index].s7 = cell_values.s7; ... Argument 'data': 8 bytes data[0] = 0 data[1] = 0
mpflanzer commented 7 years ago

Seems to be an issue with Oclgrind or maybe even LLVM/Clang. I dumped all instructions and there is indeed a store instruction that store undef values. I am not sure if this is something specific to Oclgrind or if Clang in general produces the instruction.

  %call = tail call spir_func i64 @get_global_id(i32 0) #2, !dbg !25
  %call1 = tail call spir_func i64 @get_global_id(i32 1) #2, !dbg !26
  %call3 = tail call spir_func i64 @get_global_size(i32 0) #2, !dbg !27
  %sext = shl i64 %call1, 32, !dbg !27
  %conv4 = ashr exact i64 %sext, 32, !dbg !27
  %mul = mul i64 %conv4, %call3, !dbg !27
  %add = add i64 %mul, %call, !dbg !27
  %conv6 = trunc i64 %add to i32, !dbg !27
  %conv7 = sitofp i32 %conv6 to float, !dbg !28
  %idxprom = sext i32 %conv6 to i64, !dbg !29
  %arrayidx = getelementptr inbounds <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, !dbg !29
  store <8 x float> undef, <8 x float> addrspace(1)* %arrayidx, align 32, !dbg !30

Uninitialized value written to global memory address 0x1000000000000
    Kernel: Simd8
    Entity: Global(0,0,0) Local(0,0,0) Group(0,0,0)
      store <8 x float> undef, <8 x float> addrspace(1)* %arrayidx, align 32, !dbg !30
    At line 16 of input.cl:
      data[index].s7 = cell_values.s7;

  %1 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 0, !dbg !30
  store float %conv7, float addrspace(1)* %1, align 4, !dbg !30
  %2 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 1, !dbg !30
  store float %conv7, float addrspace(1)* %2, align 4, !dbg !30
  %3 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 2, !dbg !30
  store float %conv7, float addrspace(1)* %3, align 4, !dbg !30
  %4 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 3, !dbg !30
  store float %conv7, float addrspace(1)* %4, align 4, !dbg !30
  %5 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 4, !dbg !30
  store float %conv7, float addrspace(1)* %5, align 4, !dbg !30
  %6 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 5, !dbg !30
  store float %conv7, float addrspace(1)* %6, align 4, !dbg !30
  %7 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 6, !dbg !30
  store float %conv7, float addrspace(1)* %7, align 4, !dbg !30
  %8 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 7, !dbg !30
  store float %conv7, float addrspace(1)* %8, align 4, !dbg !30
  ret void, !dbg !31
jrprice commented 7 years ago

This sequence of instructions is caused by a transformation that Oclgrind applies to the program. Ironically, this transformation was originally introduced to avoid some false-positives in the original uninitialized memory plugin!

Although the original false-positives have now been fixed by @mpflanzer's improved plugin approach, the transformation seems to still be needed for some other false-positives with reading from write-only buffers. I'm not sure what the right fix is for this at the moment, but I'll have a think about it (ideas welcome!).

mpflanzer commented 7 years ago

Why would we care about false-positives when reading from write-only buffers? Isn't that already a problem/invalid use case in itself and we could live with the false-positives in that situation?

jrprice commented 7 years ago

Let me clarify. The false positives were that the memcheck plugin would signal a read-from-write-only-buffer error even though the buffer was only written to. This happens with code like this:

kernel void write_vector_write_only_fp(global int4 *output)
{
  int i = get_global_id(0);
  output[i].x = 42;
}

The problem is that when storing to a single element of a vector/struct, the resulting LLVM instructions load the vector/struct first, update the element, then write it back. To avoid this, Oclgrind spots these situations and just does the store directly, and it's this transformation that is causing the above uninitialized memory false positive.