Pattern rewriting: replaceOp usage + documentation

This is a copy paste of this thread from the mailing list, which remained unclarified: https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/replaceOp%7Csort:date/mlir/pW8Q8u4qs6Y/M9XSQ_agAgAJ

While the DAG rewriting doc provides a good high-level overview, I think it's missing key information on how replaceOp is to be used. I found this information also missing in the codebase and the doc comments (https://github.com/tensorflow/mlir/blob/master/g3doc/GenericDAGRewriter.md )

Below are some questions, all of which are related:

1) It's not clear on which ops one is allowed to call a replaceOp when doing a rewrite? The examples that I've seen call replaceOp only on the op that's matched. The comment on replaceOp https://github.com/tensorflow/mlir/blob/b5bea2c3cf2602f63e70b59f95753e021cae4050/include/mlir/IR/PatternMatch.h#L317 suggests that it's really meant to replace the matched op. But there is silence on whether it could be called on another op! --- in the doc as well as in code comments. It actually works, but it's not clear whether it's an intended/desired feature. (see item (3) below).

2) I think it should be mentioned in the doc that, during a rewrite, one can't call an erase on another op (an op other than the one matched) because such an op might still be part of the pattern matching worklist -- this is a common source of bugs. In contrast, the matched op has already been popped off the worklist and so it's safe to call erase on it. On the other hand, replaceOp erases an op and nulls the op if it's in the worklist, and so a replaceOp(op, llvm::None) can be safely called on any other op as a substitute for erase()? The code base uses erase() at some places while replaceOp(op, llvm::None) at others.

3) Here's a scenario: let's say one inserts new ops during the rewrite and would want to immediately replace those ops right away without having to do that through another pattern rewrite (this may look weird, but assume that this is needed because there is some domain-specific information available right then that you don't want to either throw way or have it propagated through attributes to the next rewrite). The newly inserted ops would be added to the worklist, and so one can't just call erase() on them after inserting their replacements. Instead, replaceOp actually does work on them currently. So this is related to both (1) and (2) above. The documentation doesn't mention whether this is a valid / intended use case of replaceOp.

4) If calling replaceOp on other ops during a pattern rewrite is allowed by design, there appear to be instances where one could just at once delete several ops as part of a simplification. However, those patterns are just deleting one at a time (the matched op) requiring more and duplicated matching work and slower progress for the greedy pattern rewriter. As an example, consider SimplifyDeadDealloc, where a dealloc op is erased when all other uses of the memref operand are dealloc's as well and the memref is the result of an alloc. Why not delete all those dealloc uses as well using replaceOp(..., llvm::None) in one rewrite call instead of just the matched op?

tensorflow / mlir

Pattern rewriting: replaceOp usage + documentation #200