partial support inplace op it TE

Summary

This PR aims to support inplace op it TE partially. This can enable TE fusion for patterns like "at::conv, at::relu". "at::add, at::relu, at::add". We can get better performance by this.

Options

Option 1: Replace the in-place operator with the out-place operator

Pros: Easy to implement. Can re-use "RemoveTensorMutation" for replacement and the lowing function of outplace version. Will not have a performance penalty under "fusion" cases.
Cons: The fallback path is also outplaced version, which will have a performance penalty here. Sometimes we cannot safely replace the inplace op with its' outplace version.

Option 2: Lower the body in terms of in-place directly

Pros: More straightforward. Can support more scenarios.
Cons: Hard to develop. More engineer effort.

We choose option 1 in this PR, and we can consider option 2 if we observed that in many real-world scenarios, options1 failed.

Implement Details

Te-inplace We extend the behavior of Operator supported check and TryMerge.

In Operator supported check, we will create an outplace node to pass the check. After the check is done, destroy the outplace node. In TryMerge, after passing all checks, we will replace an inplace op with its outplace version.

Whether an inplace op can be replaced safely depends on the behavior of RemoveTensorMutions. 2 cases below will not be replaced.

def fn(a, b):
    return a.relu_() + b.relu()
def fn(a, b):
    c = a + b
    return c.sigmoid().add(c.relu_())

For the unit test

Add 3 cpp test, 1should be replaced, and 2 should not be replaced.
Extend test_unary_ops, test_binary_ops, test_ternary_ops in test_jit_fuser_te.py to include inplace version, all should be fused.

zhuhaozhe / pytorch