Investigate analysis results

martin-luecke commented 1 year ago

Investigate why we have so many entries in the entry "Fails statically but passes dynamically". Suspicion is that this relates to Exceptions thrown in the Analysis when the rewrite is illegal from a PDL IR perspective.

e.g. bottom left in this table

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |                98      |               1.01031 |      98 |
| Fails Static Check  |                82.6667 |               1.01224 |     248 |
| Total               |                 4      |             490       |     346 |

math-fehr commented 11 months ago

I added the missing PDL verifiers, and I now only generate verifying PDL. The results are like this now:

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |                1.35366 |               3.82759 |     111 |
| Fails Static Check  |                9.025   |               1.12461 |     722 |
| Total               |              162       |            1284       |     833 |

Though I'm not sure what these number means?

martin-luecke commented 10 months ago

Oh, must have been tired while writing this. It should be correct now. Here is a table I generated off 10000 rewrites:

Analysis failed, MLIR execution failed:  6740
Analysis failed, MLIR execution succeeded:  45
Analysis succeeded, MLIR execution failed:  1340
Analysis succeeded, MLIR execution succeeded:  11
PDL Analysis raised an exception:  1864

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |             0.00814212 |              0.991858 |    1351 |
| Fails Static Check  |             0.00663228 |              0.993368 |    6785 |
| Total               |            56          |           8080        |    8136 |

The bottom right 99% number is encouraging, though it's still interesting that we deem 45 rewrites as invalid that still succeed dynamically. I will start investigating why we have so many rewrites that pass our analysis and still fail (top right 99%).

martin-luecke commented 10 months ago

Okay, I have discovered some weird behavior. On my setup the mlir-opt binary does not have the pass --test-pdl-bytecode-pass and therefore should always fail. However, here it is reported to succeed 56 times.

As far as I can see this is because this code here. It assumes MLIR ran successfully when neither of the Exceptions occur. However, in some cases it seems to be possible that the generator created by put_operations_in_region(dag, region, ops) does not return a single entry, thus MLIR never runs and this returns None, which seems to indicate success in this code.

We probably should have a sanity check in the beginning with a very simple input IR, rewrite and expected IR to make sure that MLIR actually runs correctly

@math-fehr I think this part was written by you, do you know why this might return no region?

The specific rewrite this failed with is:

pdl.pattern : benefit(1) {
  %0 = pdl.type : i32
  %1 = pdl.operand : %0
  %2 = pdl.operation "pdltest.matchop" (%1 : !pdl.value)
  %3 = pdl.type : i32
  %4 = pdl.operand : %3
  %5 = pdl.type : i32
  %6 = pdl.operand : %5
  %7 = pdl.type : i32
  %8 = pdl.operation "pdltest.matchop" (%4, %6 : !pdl.value, !pdl.value) -> (%7 : !pdl.type)
  %9 = pdl.result 0 of %8
  %10 = pdl.type : i32
  %11 = pdl.operand : %10
  %12 = pdl.type : i32
  %13 = pdl.operation "pdltest.matchop" (%11, %9 : !pdl.value, !pdl.value) -> (%12 : !pdl.type)
  %14 = pdl.result 0 of %13
  %15 = pdl.type : i32
  %16 = pdl.operand : %15
  %17 = pdl.type : i32
  %18 = pdl.operand : %17
  %19 = pdl.type : i32
  %20 = pdl.type : i32
  %21 = pdl.operation "pdltest.matchop" (%16, %18 : !pdl.value, !pdl.value) -> (%19, %20 : !pdl.type, !pdl.type)
  %22 = pdl.result 0 of %21
  %23 = pdl.result 1 of %21
  pdl.rewrite %21 {
    pdl.replace %2 with %8
    %24 = pdl.operation "pdltest.rewriteop"
  }
}

Which is deemed incorrect by the analysis (the matching dag is not connected) and should also fail MLIR execution

math-fehr commented 10 months ago

I'm on it now!

martin-luecke commented 10 months ago

Here is the newest table with a proper MLIR build (that I spent 3h generating 😅 ):

Analysis failed, MLIR execution failed:  5490
Analysis failed, MLIR execution succeeded:  190
Analysis succeeded, MLIR execution failed:  1636
Analysis succeeded, MLIR execution succeeded:  849
PDL Analysis raised an exception:  1835

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |              0.34165   |              0.65835  |    2485 |
| Fails Static Check  |              0.0334507 |              0.966549 |    5680 |
| Total               |           1039         |           7126        |    8165 |

The 65% in the top right indicate to me that our static analysis is still too weak to catch many cases that make MLIR fail dynamically. I did not investigate this yet.

the 3% in the bottom left where MLIR does not fail, yet our analysis thinks the rewrite is invalid looks to me as if we do not generate complex enough inputs to the MLIR rewrite. From investigating the results a little this behavior shows when e.g. an operation is replaced by itself, which is unsafe as the replacement degenerates to an erasure. But the inputs the fuzzer generated for this rewrite did not use the matched op in any of the cases. This looks like we have to make the fuzzer slightly more involved. We could also make this driven by our analysis result, i.e. spend more time generating more complex inputs if we already know the analysis result is False

What do you think @math-fehr

math-fehr commented 10 months ago

Yeah that makes a lot of sense! I've been passing some time trying to reproduce your results yesterday, and indeed the 1000/7000 is what I get locally as well. It's taking a long time to generate that table, if you want what I can do directly is paralellize this quickly to make this N_CPU times faster!

I'll look into the fuzzer!

math-fehr commented 10 months ago

I just added multithreading, so if you have a good machine this should speed up things greatly. Also, I'll try to make this more reproducible, so we can more quickly see and share programs that are failing. I also added an -n flag to run less examples, because 10k is a bit too long for my laptop!

math-fehr commented 10 months ago

I also improved the printer in the table so we actually print the seeds that are failing/succeeding, and I added the --seed option so that we can more easily reproduce these things in the other scripts.

martin-luecke commented 10 months ago

That's great! It generates much faster now. 👍 I'll look into it this afternoon

martin-luecke commented 10 months ago

I looked into the category "Analysis failed, MLIR execution succeeded" and these are the results:

1 - replacement with itself: [53710184, 986607411, 453082198, 857706059, 343558720, 245997989, 726033229, 491648746, 408902191, 202563114, 951616026, 569519230, 333821845, 451135926, 719811309, 260615611, 835064486, 597551532, 245649464, 813873333, 658467859, 416864332, 446221126, 85600382, 1010392821, 200165247, 578770519, 524409999, 924892835, 756603494, 377184186, 512831000, 265404788] 2 - replacement with (one of) its own results: [770902343, 454833168, 532373262, 634914704, 966517452, 410290553, 413571616, 248939737, 193559947] 3 - scoping analysis too limited: [801914424, 658467859, 976459372, 935389095, (741148029 this is actually similar to 214129241), 578770519, 924892835, 512831000, 466583095] - (at least 976459372, 935389095 are not actually save, depending on where the use of the replaced op is!) 4 - erased op still in use: [354430857] 5 - number of results different in replacement, should actually fail MLIR? [976459372, 466583095 and a few others as well]

regarding 1: MLIR does not complain about the replacement with itself and that only breaks when there are actually uses outside of the matched IR. regarding 2: Same thing @math-fehr How far off are we on the option of generating uses of operations that are replaced / erased?

regarding 3:

The analysis has issues when we have e.g. this pattern:

%3 = pdl.operation "pdltest.matchop" (%1 : !pdl.value) -> (%2 : !pdl.type)
%4 = pdl.result 0 of %3
pdl.rewrite %3 {
    ...
    %7 = pdl.operation "pdltest.rewriteop" (%4, %4 : !pdl.value, !pdl.value) -> (%5, %6 : !pdl.type, !pdl.type)

because the scope of %4 is not picked up correctly. I will have a look at this probably tomorrow.

976459372, 935389095 are not actually save. I think if we generate more options with various uses of operations that are replaced we will find counter examples to rewrites like these. The issue with them is that it matches A / \ B C i.e. both C and C are after A, but we don't have a guarantee on the order of B and C. Then they replace B with C.

regarding 4: I am not quite sure why this is the error here, have to look deeper into this one

regarding 5: I am not sure why these pass MLIR, the do a replacement with the wrong number of values (number of results of the replaced op have to match!). I know MLIR has an assertion for this, but it does not trigger here.

@math-fehr We should think about a sanity check whether our generated IR actually provides a match for the corresponding rewrite. I am worried we generate a bunch of inputs, run them through MLIR, find no matches, i.e. successful return code and assume correctness. Or do we have something like this already? How we could do this: (We could also enable this as a debug option instead of on default) Add the option "--debug-only=pdl-bytecode" to the call to mlir and check "RecordMatch" in res.stderr This option enables the detailed debugging of the PDL interpreter, which prints "Executing RecordMatch" after satisfying all requirements for a match. What do you think?

math-fehr commented 10 months ago

Nice! For 1 and 2, I can realistically do that today!

For 3, I think I have a good idea on how to fix that, which should also speedup our checks. Currently, I generate a bunch of tests with many different configurations. Realistically, because of how the patterns behave, I think that if I generate "the worst case" (one operation per block, and no dominance between unrelated blocks), then it should always catch the error. That would also mean that we don't need to fuzz these matches, we just need to generate the worst one. I can try to implement that today/tomorrow, but most likely tomorrow.

math-fehr commented 10 months ago

I'll also try to implement the sanity check tomorrow with your debug-only solution!

math-fehr commented 10 months ago

1-2 is implemented with the last commit now

math-fehr commented 10 months ago

I realized I was really wrong with problem 3, there is no "worst case" scenario. I'm still working on something to make it better. Instead of generating randomly a block DAG, and then adding operations to it, I'll just get an operation ordering, and then generating all the possible interleavings, and then generating a corresponding worst DAG for the given interleaving (one operation per block). I'm on it now

math-fehr commented 10 months ago

I wrote a new fuzzing algorithm, it seems to work! I unveiled a new kind of bug, that is on our end however (489407815):

  builtin.module attributes  {"sym_name" = "patterns"} {
    pdl.pattern : benefit(1) {
      %0 = pdl.type : i32
      %1 = pdl.operand : %0
      %2 = pdl.type : i32
      %3 = pdl.operand : %2
      %4 = pdl.type : i32
      %5 = pdl.operation "pdltest.matchop" (%1, %3 : !pdl.value, !pdl.value) -> (%4 : !pdl.type)
      %6 = pdl.result 0 of %5
      %7 = pdl.type : i32
      %8 = pdl.operand : %7
      %9 = pdl.type : i32
      %10 = pdl.type : i32
      %11 = pdl.operation "pdltest.matchop" (%8, %6 : !pdl.value, !pdl.value) -> (%9, %10 : !pdl.type, !pdl.type)
      %12 = pdl.result 0 of %11
      %13 = pdl.result 1 of %11
      pdl.rewrite %11 {
        %14 = pdl.operation "pdltest.rewriteop" (%13 : !pdl.value)
        pdl.erase %14
      }
    }
  }

This will actually succeed, because the operation that is being added will be removed directly, so even if it accessed non-dominated operands, it will still pass the verifier at the end.

math-fehr commented 10 months ago

Another one that should be accepted (696391871):

  pdl.pattern : benefit(1) {
    %0 = pdl.operation "pdltest.matchop"
    pdl.rewrite %0 {
      pdl.replace %0 with %0
      ^^^^^^^^^^^--------------
      | replacement_with_itself
      -------------------------
    }
  }
}

Despite being replaced with itself, it has no results, so this cannot go wrong

math-fehr commented 10 months ago

Also, I added a check, and indeed all generated programs are matched by pdl

math-fehr commented 10 months ago

And finally, this kind of rewrite is accepted by our analysis, but not by MLIR, because we are replacing one operation with no results with multiple results. @martin-luecke, what should we do here? Should we just not generate these rewrites, or should we add that in our analysis?

builtin.module {
  builtin.module attributes  {"sym_name" = "patterns"} {
    pdl.pattern : benefit(1) {
      %0 = pdl.type : i32
      %1 = pdl.operand : %0
      %2 = pdl.type : i32
      %3 = pdl.operation "pdltest.matchop" (%1 : !pdl.value) -> (%2 : !pdl.type)
      %4 = pdl.result 0 of %3
      %5 = pdl.type : i32
      %6 = pdl.operand : %5
      %7 = pdl.operation "pdltest.matchop" (%4, %6 : !pdl.value, !pdl.value)
      pdl.rewrite %7 {
        pdl.replace %7 with (%4 : !pdl.value)
      }
    }
  }
  builtin.module attributes  {"sym_name" = "ir"} {
    func.func @test(%8 : i32) {
      %9 = "pdltest.matchop"(%8) : (i32) -> i32
      "test.use_op"(%9) : (i32) -> ()
      "pdltest.matchop"(%9, %8) : (i32, i32) -> ()
      "test.use_op"() : () -> ()
    }
  }
}

martin-luecke commented 10 months ago

Hey Mathieu, Thanks for your effort! The new fuzzing approach looks very promising! I changed a few things:

slightly extended the analysis and got rid of a few cases of self replacements.
restructured how the fuzzing results are reported to properly differ between the ways the analysis fails.
For MLIR we also print the seeds that we did not manage to generate matches for.

Can it be that the seeds changed with the new approach to fuzzing? At least I could not regenerate your examples from your seeds, but managed to find similar ones. As far as I tested now I fixed the issues you mentioned in your seeds 489407815 and 696391871.

what should we do here? Should we just not generate these rewrites, or should we add that in our analysis?

Yes, I will add that soon as well, I think we do not yet handle replacement with values quite well. In general we should look to only make the analysis stronger, never remove certain inputs.

I will soon start looking into cases where the analysis returns success and the rewrite fails in MLIR, as the analysis should ideally be strictly stronger, i.e. conservative than the dynamic execution.

math-fehr commented 10 months ago

Really nice! Oh yes, the seeds might have changed when I changed how to generate the matching sadly!

martin-luecke commented 10 months ago

Cheers guys, I iterated some more on the analysis and added some missing checks. The current table looks like this:

Total: s fail d fail, s succ d succ, s fail d succ, s succ d fail, failed analyses
categories: 9395, 253, 317,35,0

|                     |   Passes Dynamic Check |   Fails Dynamic Check |   Total |
|---------------------+------------------------+-----------------------+---------|
| Passes Static Check |               0.878472 |              0.121528 |     288 |
| Fails Static Check  |               0.03264  |              0.96736  |    9712 |
| Total               |             570        |           9430        |   10000 |

The biggest issues we have are:

unsafe erasures and replacements of the root op, would be fixed if we would generate a use for the root op: 597409992, 333889688, 491876988, 675765611, 245997989,726033229, 491648746, 1042419699, 408902191, 202563114,951616026, 569519230, 338758748, 333821845, 7614311,923177764, 260615611, 323906379, 835064486,805965556, 598993846, 813873333, 730872099, 416864332, 912978988, 85600382, 1010392821, 667330844, 200165247, 891082067, 756603494, 377184186 e.g.
```
builtin.module {
pdl.pattern : benefit(1) {
%0 = pdl.type : i32
%1 = pdl.operand : %0
%2 = pdl.type : i32
%3 = pdl.operand : %2
%4 = pdl.operation "pdltest.matchop" (%1, %3 : !pdl.value, !pdl.value)
pdl.rewrite %4 {
  pdl.erase %4
  ^^^^^^^^^-------
  | unsafe erasure
  ----------------
}
}
}
```

A B C ordering, what I mentioned above already A / \ B C rewrite replaces B with C. For that we don't generate the counter example.976459372, 935389095, 62163759, 512831000 e.g.

builtin.module {
pdl.pattern : benefit(1) {
%0 = pdl.type : i32
%1 = pdl.type : i32
%A = pdl.operation "pdltest.matchop" -> (%0, %1 : !pdl.type, !pdl.type)
%3 = pdl.result 0 of %A
%4 = pdl.result 1 of %A
%5 = pdl.type : i32
%6 = pdl.operand : %5
%7 = pdl.type : i32
%B = pdl.operation "pdltest.matchop" (%6, %3 : !pdl.value, !pdl.value) -> (%7 : !pdl.type)
%9 = pdl.result 0 of %8
%C = pdl.operation "pdltest.matchop" (%4 : !pdl.value)
pdl.rewrite %C {
  pdl.replace %C with %B
  ^^^^^^^^^^^---------------------------------------------------------
  | pdl.Operation used for replacement is not part of the matched DAG!
  --------------------------------------------------------------------
}
}
}

A weird one: This one should obviously fail MLIR with wrong number of replacement values (or so) but does not. It looks to me like we only generate exactly one MLIR input that does not fail the rewrite. 390260112:

builtin.module {
pdl.pattern : benefit(1) {
%0 = pdl.type : i32
%1 = pdl.operation "pdltest.matchop" -> (%0 : !pdl.type)
%2 = pdl.result 0 of %1
%3 = pdl.operation "pdltest.matchop" (%2 : !pdl.value)
pdl.rewrite %3 {
  pdl.replace %3 with %1
  ^^^^^^^^^^^-------------------------
  | wrong number of replacement values
  ------------------------------------
}
}
}

@math-fehr What are your thoughts? Do you think we can generate counter examples for these?

math-fehr commented 10 months ago

I'll take a look today and see what I can generate! I thought I generated a use for the root op, we'll see if I made a mistake or if it's something else!

math-fehr commented 10 months ago

So just after looking at it, for problem 1, I think it's a bug in the analysis part. All these examples have roots that have no results, so it's actually valid?

math-fehr commented 10 months ago

For problem 3, my feeling is that this is actually MLIR that has a bug somewhere. It generates one example, that gets rewritten correctly. My feeling is that if the number of results that are replacing the operation is greater, then MLIR just "forgets" about the last ones. In the example you sent for instance, we replace an operation with no results with an operation with a single result. So it just deletes the operation.

math-fehr commented 10 months ago

And for problem 2, I think I see the problem. The issue is that I always thought that PDL patterns were rooted DAGS, but it seems that this is not the case? At least on your example, there is a root that is matched, and then one other unrelated operation is used for the replacement. I'll change how I generate examples tomorrow, because the problem is here that we need to use basic blocks again to generate the correct counter example!

martin-luecke commented 10 months ago

So just after looking at it, for problem 1, I think it's a bug in the analysis part. All these examples have roots that have no results, so it's actually valid?

Indeed, I missed that 👍 . I fixed this with my latest commit This means errors of categories 1 and 2 above should be fixed

math-fehr commented 10 months ago

Wait, categories 2 as well? My feeling is that this is an MLIR bug right? Or is it a misunderstanding in my part that these PDL rewrites should be allowed?

martin-luecke commented 10 months ago

Sorry, too many numbers flying around. I meant category 3.

math-fehr commented 10 months ago

To be fair, I feel that we should keep the analysis you had in category 3. While MLIR accepts it, I believe this should be an error.

martin-luecke commented 10 months ago

Yes, I agree. I should put in a variable to disable it. My intention is to collect all bugs we find and submit patches for them eventually.

math-fehr commented 10 months ago

It took longer than expected, but now I have an algorithm for category 2 that should generate all possible matches. It's actually quite a funny algorithm, but it took longer than expected finding it.

However, in the 4 examples you sent, actually none of them are fixed by it. This is because the first 2 examples are actually replacement of 0 results, and the other 2 examples seems to be unrelated problems?

Here is an example of program that we can now generate a counter-example correctly:

builtin.module {
  pdl.pattern : benefit(1) {
    %0 = pdl.type : i32
    %1 = pdl.operand : %0
    %2 = pdl.type : i32
    %3 = pdl.operand : %2
    %4 = pdl.type : i32
    %5 = pdl.type : i32
    %6 = pdl.operation "pdltest.matchop" (%1, %3 : !pdl.value, !pdl.value) -> (%4, %5 : !pdl.type, !pdl.type)
    %7 = pdl.result 0 of %6
    %8 = pdl.result 1 of %6
    %9 = pdl.type : i32
    %10 = pdl.operand : %9
    %11 = pdl.type : i32
    %12 = pdl.type : i32
    %13 = pdl.operation "pdltest.matchop" (%7, %10 : !pdl.value, !pdl.value) -> (%11, %12 : !pdl.type, !pdl.type)
    %14 = pdl.result 0 of %13
    %15 = pdl.result 1 of %13
    %16 = pdl.operation "pdltest.matchop" (%8 : !pdl.value) -> (%11 : !pdl.type)
    pdl.rewrite %16 {
      pdl.replace %16 with %13
    }
  }
}

math-fehr commented 10 months ago

62163759 -> seems that your error is that we erase an operation not part of the matching DAG. I believe that an MLIR verifier is missing here? 512831000 -> Same here

xdslproject / xdsl-pdl

Investigate analysis results #2