tenstorrent / tt-mlir

Tenstorrent MLIR compiler
https://tenstorrent.github.io/tt-mlir/
Apache License 2.0
62 stars 7 forks source link

Add TTIR Permute OP #652

Open uazizTT opened 3 weeks ago

uazizTT commented 3 weeks ago

Implement the ttir::permute op that can lower to ttnn.permute op

Currently there is a ttir::transpose op that lowers to ttnn.permute op, check if there is need to implement ttnn::transpose or else continue to lower ttir::transpose to ttnn::permute

Change stablehlo::transpose op to use ttir::permute op

mrakitaTT commented 3 weeks ago

Currently there is a ttir::transpose op that lowers to ttnn.permute op, check if there is need to implement ttnn::transpose or else continue to lower ttir::transpose to ttnn::permute

Change stablehlo::transpose op to use ttir::permute op

We should check with someone from Metal team about this, because they already have ttnn::transpose OP implemented (see third_party/tt-metal/src/tt-metal/ttnn/cpp/ttnn/operations/data_movement/transpose/transpose.hpp). If they plan to support it long term then it is probably better to lower to ttir::transpose/ttnn::transpose instead of ttir::permute/ttnn::permute, because it might be more optimized for transpose than generic permute OP.

nsmithtt commented 3 weeks ago

We could lower to ttir.permute and then have a pass that special cases this to ttir.transpose. I agree that we should have a path for specializing this case because transpose will definitely be more optimal than a generic permute.

mrakitaTT commented 3 weeks ago

I've checked now and StableHLO, TOSA and ONNX all use a single OP to capture permute/transpose, and they all call it transpose, maybe because of NumPy transpose. Maybe it makes sense for use to also have only one OP for that in TTIR? Having more generic OPs in TTIR and then lowering to more specialized OPs in lower dialects (either ttnn.permute or ttnn.transpose). Similar to that discussion about having empty/fill/constant OPs, where we decided to only have empty/constant in TTIR because constant also does filling in other dialects.

I am more inclined towards name permute in TTIR because it captures more correctly what the OP does, although I don't know if we want to use different nomenclature than others, maybe that makes life harder for third parties adopting tt-mlir. What is your preference on that? @nsmithtt

nsmithtt commented 3 weeks ago

Hmm I'm a bit indifferent as to what we call it, permute seems good to me. The question is do we have both permute and transpose or just permute? I suppose having both kind of forces our hand with regard to naming.

In general having fewer ops is better, but at the same time a 2dim transpose is more common and maps a bit clearer to an underlying TTNN op. I think we should definitely have both ttnn.transpose and ttnn.permute in the ttnn dialect.

mrakitaTT commented 3 weeks ago

So let me try to capture all the arguments that we both mentioned.

Pros for having both ops:

Cons for having both ops:

Taking all this into consideration, I am a bit more inclined towards having a single op, but I don't really have a strong opinion about this. None of the arguments above are very strong nor would having few more ops break them so much :)

I think we should definitely have both ttnn.transpose and ttnn.permute in the ttnn dialect.

That for sure.

AleksKnezevic commented 3 weeks ago

I also prefer a single, multi-axis OP, and agree with naming in permute.

rpavlovicTT commented 3 weeks ago

I also prefer a single, multi-axis OP, and agree with naming in permute.

+1

sdjordjevicTT commented 3 weeks ago

Currently there is a ttir::transpose op that lowers to ttnn.permute op, check if there is need to implement ttnn::transpose or else continue to lower ttir::transpose to ttnn::permute Change stablehlo::transpose op to use ttir::permute op

We should check with someone from Metal team about this, because they already have ttnn::transpose OP implemented (see third_party/tt-metal/src/tt-metal/ttnn/cpp/ttnn/operations/data_movement/transpose/transpose.hpp). If they plan to support it long term then it is probably better to lower to ttir::transpose/ttnn::transpose instead of ttir::permute/ttnn::permute, because it might be more optimized for transpose than generic permute OP.

When we implemented the transpose op for the MNIST model, it wasn't present in the TTNN. It was added recently. Also, the transpose op in TTNN is just a wrapper around the permute op: https://github.com/tenstorrent/tt-metal/blob/main/ttnn/cpp/ttnn/operations/data_movement/transpose/transpose.cpp

I don't have any specific incline regarding one or another, but I would like our dialects differentiate transpose op is just a 2 dim specialization of a permute. ONNX MLIR and stablehlo MLIR have only transpose op defined(like in numpy, includes permute), while torch MLIR has both ops(transpose and permute) defined.

nsmithtt commented 3 weeks ago

@sdjordjevicTT just to double check are you suggesting that we should have both?

I think the only reason torch mlir distinguishes is because they are modeling torch API which has both. Also I don't think we'll ever be lowering their torch dialect directly, that's an internal dialect to their project, they only support emitting tosa, stablehlo, and linalg.

sdjordjevicTT commented 3 weeks ago

@sdjordjevicTT just to double check are you suggesting that we should have both?

I think the only reason torch mlir distinguishes is because they are modeling torch API which has both. Also I don't think we'll ever be lowering their torch dialect directly, that's an internal dialect to their project, they only support emitting tosa, stablehlo, and linalg.

Maybe my statement wasn't clear enough. My point was that it would be great for our dialects to ensure that we don't overload the definition of transpose(I am against having a transpose op that does permute)... Having a single permute op on the TTIR level sounds perfectly fine to me.

nsmithtt commented 3 weeks ago

Maybe my statement wasn't clear enough. My point was that it would be great for our dialects to ensure that we don't overload the definition of transpose(I am against having a transpose op that does permute)... Having a single permute op on the TTIR level sounds perfectly fine to me.

Cool sounds great. Let's move forward with ttir.permute, @sdjordjevicTT fyi this will break forge upon uplift so let us know if special planning should be considered once we have a PR ready to land.

sdjordjevicTT commented 3 weeks ago

Let's keep the transpose op in TTIR until we uplift the code in forge and change the lowering in forge from transpose to permute?

nsmithtt commented 3 weeks ago

Sounds good, we can have a deprecation grace period