wsmoses / Tapir-LLVM

Tapir extension to LLVM for optimizing Parallel Programs
Other
132 stars 24 forks source link

Nested `cilk_for` results in nonsense assertion failures #9

Closed bcc32 closed 7 years ago

bcc32 commented 8 years ago

tarball

My group has the following snippet in our quadtree implementation (quadtree.c:202-210):

// Pairs of lines in this node
cilk_for (size_t i = 0; i < quadtree->num_lines; i++) {
  cilk_for (size_t j = i + 1; j < quadtree->num_lines; j++) {
    assert (i != j);
    Line *l1 = quadtree->lines[i];
    Line *l2 = quadtree->lines[j];
    intersect_any_order(l1, l2, time_step, intersection_event_list);
  }
}

The following output is produced, compiling with make DEBUG=1 and running on cqrun8:

Compilation:

a2z@localhost:~/project2$ make -B DEBUG=1
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o intersection_event_list.o -c intersection_event_list.c
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o screensaver.o -c screensaver.c
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o vec.o -c vec.c
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o quadtree.o -c quadtree.c
Selecting successive spawn in place of DAC for recursive cilk_for in function quadtree_intersections|pfor.cond16
<no RPN>
  %cmp = icmp ult i64 %4, %6, !dbg !145
<---->

pfor.cond:                                        ; preds = %pfor.inc10, %entry
  %4 = load i64, i64* %i, align 8, !dbg !141
  %5 = load %struct.quadtree*, %struct.quadtree** %quadtree.addr, align 8, !dbg !143
  %num_lines = getelementptr inbounds %struct.quadtree, %struct.quadtree* %5, i32 0, i32 3, !dbg !144
  %6 = load i64, i64* %num_lines, align 8, !dbg !144
  %cmp = icmp ult i64 %4, %6, !dbg !145
  br i1 %cmp, label %pfor.detach, label %pfor.end12, !dbg !146

<---->
</no RPN>
no induction var
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o intersection_detection.o -c intersection_detection.c
clang -std=gnu99 -Wall -ftapir 
[project2.tar.gz](https://github.com/wsmoses/Parallel-IR/files/544924/project2.tar.gz)

-g -O0 -gdwarf-3  -o collision_world.o -c collision_world.c
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o line_demo.o -c line_demo.c
clang -std=gnu99 -Wall -ftapir -g -O0 -gdwarf-3  -o graphic_stuff.o -c graphic_stuff.c
clang -lrt -lm -lcilkrts -lXext -lX11  -o screensaver intersection_event_list.o screensaver.o vec.o quadtree.o intersection_detection.o collision_world.o line_demo.o graphic_stuff.o

Execution:

a2z@localhost:~/project2$ cqrun8 ./screensaver 4000
6.172 Cloud Queue 2016 (8-cores, no hyperthreading)

Submitting Job: ./screensaver 4000
Waiting for job to finish...
==== Standard Error ====
./screensaver: /afs/csail/proj/courses/6.172/cilkplus-4_8-install/lib64/libcilkrts.so.5: no version information available (required by ./screensaver)
./screensaver: /afs/csail/proj/courses/6.172/cilkplus-4_8-install/lib64/libcilkrts.so.5: no version information available (required by ./screensaver)
screensaver: quadtree.c:205: void quadtree_intersections(const quadtree_t *const, IntersectionEventListReducer *): Assertion `i != j' failed.
screensaver: quadtree.c:205: void quadtree_intersections(const quadtree_t *const, IntersectionEventListReducer *): Assertion `i != j' failed.
screensaver: quadtree.c:205: void quadtree_intersections(const quadtree_t *const, IntersectionEventListReducer *): Assertion `i != j' failed.
/bin/bash: line 1: 75166 Aborted                 (core dumped) ./screensaver 4000

==== Standard Output ====
Number of frames = 4000
Input file path is: input/mit.in

tar: ./stdout: time stamp 2016-10-21 17:29:31 is 16.9355163 s in the future
tar: ./job_a2z_20161021T172852_66462_in.tar: time stamp 2016-10-21 17:29:31 is 16.8826354 s in the future
tar: ./stderr: time stamp 2016-10-21 17:29:31 is 16.3962651 s in the future
tar: .: time stamp 2016-10-21 17:29:31 is 16.3629695 s in the future

This problem does not occur if the body of the outer cilk_for loop (i.e., the entirety of the inner cilk_for loop) is extracted into a new function whose declaration is annotated with __attribute__((noinline)).

neboat commented 8 years ago

My current theory is that, at optimization level 0, Loop2Cilk fails to handle these loops. As a result, these loops are getting compiled such that there is a race on the loop iteration variables. These problems seem to disappear if you use -O1 instead of -O0 for the DEBUG build.

wsmoses commented 8 years ago

I cannot reproduce this issue on master.

neboat commented 8 years ago

I could, using make -B DEBUG=1.

wsmoses commented 7 years ago

Fixed on master