Open jerryct opened 7 years ago
Seems to be an issue with Oclgrind or maybe even LLVM/Clang. I dumped all instructions and there is indeed a store instruction that store undef
values. I am not sure if this is something specific to Oclgrind or if Clang in general produces the instruction.
%call = tail call spir_func i64 @get_global_id(i32 0) #2, !dbg !25
%call1 = tail call spir_func i64 @get_global_id(i32 1) #2, !dbg !26
%call3 = tail call spir_func i64 @get_global_size(i32 0) #2, !dbg !27
%sext = shl i64 %call1, 32, !dbg !27
%conv4 = ashr exact i64 %sext, 32, !dbg !27
%mul = mul i64 %conv4, %call3, !dbg !27
%add = add i64 %mul, %call, !dbg !27
%conv6 = trunc i64 %add to i32, !dbg !27
%conv7 = sitofp i32 %conv6 to float, !dbg !28
%idxprom = sext i32 %conv6 to i64, !dbg !29
%arrayidx = getelementptr inbounds <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, !dbg !29
store <8 x float> undef, <8 x float> addrspace(1)* %arrayidx, align 32, !dbg !30
Uninitialized value written to global memory address 0x1000000000000
Kernel: Simd8
Entity: Global(0,0,0) Local(0,0,0) Group(0,0,0)
store <8 x float> undef, <8 x float> addrspace(1)* %arrayidx, align 32, !dbg !30
At line 16 of input.cl:
data[index].s7 = cell_values.s7;
%1 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 0, !dbg !30
store float %conv7, float addrspace(1)* %1, align 4, !dbg !30
%2 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 1, !dbg !30
store float %conv7, float addrspace(1)* %2, align 4, !dbg !30
%3 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 2, !dbg !30
store float %conv7, float addrspace(1)* %3, align 4, !dbg !30
%4 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 3, !dbg !30
store float %conv7, float addrspace(1)* %4, align 4, !dbg !30
%5 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 4, !dbg !30
store float %conv7, float addrspace(1)* %5, align 4, !dbg !30
%6 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 5, !dbg !30
store float %conv7, float addrspace(1)* %6, align 4, !dbg !30
%7 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 6, !dbg !30
store float %conv7, float addrspace(1)* %7, align 4, !dbg !30
%8 = getelementptr <8 x float>, <8 x float> addrspace(1)* %data, i64 %idxprom, i64 7, !dbg !30
store float %conv7, float addrspace(1)* %8, align 4, !dbg !30
ret void, !dbg !31
This sequence of instructions is caused by a transformation that Oclgrind applies to the program. Ironically, this transformation was originally introduced to avoid some false-positives in the original uninitialized memory plugin!
Although the original false-positives have now been fixed by @mpflanzer's improved plugin approach, the transformation seems to still be needed for some other false-positives with reading from write-only buffers. I'm not sure what the right fix is for this at the moment, but I'll have a think about it (ideas welcome!).
Why would we care about false-positives when reading from write-only buffers? Isn't that already a problem/invalid use case in itself and we could live with the false-positives in that situation?
Let me clarify. The false positives were that the memcheck plugin would signal a read-from-write-only-buffer error even though the buffer was only written to. This happens with code like this:
kernel void write_vector_write_only_fp(global int4 *output)
{
int i = get_global_id(0);
output[i].x = 42;
}
The problem is that when storing to a single element of a vector/struct, the resulting LLVM instructions load the vector/struct first, update the element, then write it back. To avoid this, Oclgrind spots these situations and just does the store directly, and it's this transformation that is causing the above uninitialized memory false positive.
Hi,
attached you will find a minimal reproducer. My kernels are using float8 and I get these false positives. It does not only happen with float8 but also with float2 (I have not tested with float4). I stripped down the kernels to the bare minimum - thus it does not do anything usefull.
Until now I don't understand the reason after looking at the code. I like the program so I would appreciated help.
jerry
Reproducer:
Save as simd8.cl ### kernel void Simd8(global float8 const restrict data) { const int u = get_global_id(0); const int v = get_global_id(1); const int index = get_global_size(0) v + u;
} ###
Save as simd8.sim ### simd8.cl Simd8
1 8 1 1 1 1