intel / intel-graphics-compiler

Other
606 stars 158 forks source link

Malfunction in generated code: Store operation seems to not occur #160

Closed sdh4 closed 3 years ago

sdh4 commented 3 years ago

I am running into a problem where a store operation in the generated code seems to be not happening like it ought to, leading to incorrect output.

Interestingly, using POCL (which is presumably based on the same clang/llvm) gives correct output.

This is on Fedora 32, intel-opencl-20.47.18513-1.fc32.x86_64 from the copr repository. clang-10.0.1-3 and llvm-10.0.1-4

I've surrounded the problematic line with prints, that illustrate the problem. In this case b[2] is being scaled, and reads correctly when accessed as b[pivots[row]] where pivots[row] is 2, but does not read correctly when accessed as b[2]. Later accesses to the same memory seem to indicate that the updated value was not stored, and reads back as the original value. Here is the code:

      if (printflag) {
    printf("Before: b[pivots[row]  + solvecnt*n]=%f; first_el=%f\n",b[pivots[row]  + solvecnt*n],first_el);
    printf("Before: row=%d; pivots[row]=%d; solvecnt*n=%d\n",(int)row,(int)pivots[row],(int)(solvecnt*n));
    printf("Before: b[2]=%f\n",b[2]);
      }
      b[pivots[row]  + solvecnt*n] /= first_el;
      if (printflag) {
    printf("After: b[pivots[row]  + solvecnt*n]=%f; first_el=%f\n",b[pivots[row]  + solvecnt*n],first_el);
    printf("After: b[pivots[row]]=%f; row=%d; pivots[row]=%d; solvecnt*n=%d\n",b[pivots[row]],(int)row,(int)pivots[row],(int)(solvecnt*n));
    printf("After: b[2]=%f\n",b[2]);
      }

and here is the output running on NEO:

Before: b[pivots[row]  + solvecnt*n]=34.162400; first_el=-0.004093
Before: row=0; pivots[row]=2; solvecnt*n=0
Before: b[2]=34.162400
After: b[pivots[row]  + solvecnt*n]=-8346.176255; first_el=-0.004093
After: b[pivots[row]]=-8346.176255; row=0; pivots[row]=2; solvecnt*n=0
After: b[2]=34.162400

Output running on the CPU via POCL:

Before: b[pivots[row]  + solvecnt*n]=34.162399; first_el=-0.004093
Before: row=0; pivots[row]=2; solvecnt*n=0
Before: b[2]=34.162399
After: b[pivots[row]  + solvecnt*n]=-8346.175781; first_el=-0.004093
After: b[pivots[row]]=-8346.175781; row=0; pivots[row]=2; solvecnt*n=0
After: b[2]=-8346.175781

This looks like it could be quite tricky to troubleshoot. Its always possible that there is some nearby bug in my code causing undefined behavior. Is this worth trying to track down more deeply?

sdh4 commented 3 years ago

I can confirm that (after working around a few POCL bugs) the code executes on POCL without tripping valgrind. That means it is probably not a memory corruption problem from the surrounding code.

Separately, it appears that the errors seem to come in groups of 8 work items, on boundaries divisible by 8. I may be able to provide a test case, but I'm not sure how easy it will be to shrink it down to something simple.

JacekDanecki commented 3 years ago

This issue looks like problem on IGC side, so I'm transferring it to IGC project: https://github.com/intel/intel-graphics-compiler

sdh4 commented 3 years ago

I can confirm that this problem still occurs under Fedora 33 (intel-opencl-20.47.18513-1.fc33.x86_64; intel-igc-opencl-1.0.5585-1.fc33.x86_64; llvm-11.0.0-1.fc33.x86_64; clang-11.0.0-2.fc33.x86_64). Hardware is HD Graphics 620 (rev 02) (via lspci).

Any suggestions on environment variables/compilation flags to troubleshoot? Perhaps disabling certain forms of optimization?

sdh4 commented 3 years ago

I did eventually track this down and it seems to have originated from a library version mismatch with old manually compiled libraries being dynamically linked/loaded in place of the correct ones from the RPM's