torch.vtensor - Githubissues

llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

Other

1.34k stars 502 forks source link

torch.vtensor #948

Closed MissyDu closed 2 years ago

MissyDu commented 2 years ago

When I transform form a model from TorchMLIR to OtherMLIR , i have met some issues:

If i don't care about backend, then do i have to run pass of "createTorchScriptModuleToTorchBackendPipeline"?
I am still confused about torch.vtensor, what's the main difference with torch.tensor? besides,i found torch.tensor can't convert "ToBuiltinTensorOp", so if i don't run that pass, i can't get a mlir BuildinTensor TT

Looking forward to your answers, and thank you so much !

ramiro050 commented 2 years ago

Hi @MissyDu,

The pass createTorchScriptModuleToTorchBackendPipeline does several things to the MLIR. First, it simplifies the MLIR code by getting rid of dead code and turning what originally looks like a python class definition into a single function. Second, it does a few nice transformations, such as decomposing functions, making everything have value-semantics, and propagate shape and dtype information. You technically don't need to run that pass to turn the MLIR generated by the importer into other MLIR, but it will make things significantly easier for you, since you'll be dealing with simpler MLIR that has more useful information in it.

In regards to torch.vtensor vs torch.tensor, the v stands for value-semantics. When a tensor has value semantics, the tensor is immutable. Therefore, there is no way to overwrite the contents of a torch.vtensor in place. To modify the contents of a torch.vtensor, you have to make a new copy of it. On the other hand, the torch.tensor does allow overwrites in place. Because it is easier to reason about graph transformations when tensors have value semantics (and some backends such as linalg require all tensors to satisfy this), torch-MLIR will first convert the graph to one with only torch.vtensors in it, getting rid of any overwrites and torch.tensors along the way. In general, if you see any torch.tensors after running createTorchScriptModuleToTorchBackendPipeline, this is likely a sign that there is something in your IR that is either unsupported or there is a bug in one of the torch-mlir passes.

Hope this helps! Let me know if you have any other questions.

Tengxu-Sun commented 2 years ago

Hi @MissyDu,

The pass createTorchScriptModuleToTorchBackendPipeline does several things to the MLIR. First, it simplifies the MLIR code by getting rid of dead code and turning what originally looks like a python class definition into a single function. Second, it does a few nice transformations, such as decomposing functions, making everything have value-semantics, and propagate shape and dtype information. You technically don't need to run that pass to turn the MLIR generated by the importer into other MLIR, but it will make things significantly easier for you, since you'll be dealing with simpler MLIR that has more useful information in it.

In regards to torch.vtensor vs torch.tensor, the v stands for value-semantics. When a tensor has value semantics, the tensor is immutable. Therefore, there is no way to overwrite the contents of a torch.vtensor in place. To modify the contents of a torch.vtensor, you have to make a new copy of it. On the other hand, the torch.tensor does allow overwrites in place. Because it is easier to reason about graph transformations when tensors have value semantics (and some backends such as linalg require all tensors to satisfy this), torch-MLIR will first convert the graph to one with only torch.vtensors in it, getting rid of any overwrites and torch.tensors along the way. In general, if you see any torch.tensors after running createTorchScriptModuleToTorchBackendPipeline, this is likely a sign that there is something in your IR that is either unsupported or there is a bug in one of the torch-mlir passes.

Hope this helps! Let me know if you have any other questions.

hi @ramiro050 , i'm wondering how to modify the contents of a torch.vtensor. For example, how to modify it's shape. Is there any example to refer?

Tengxu-Sun commented 2 years ago

Hi @MissyDu, The pass createTorchScriptModuleToTorchBackendPipeline does several things to the MLIR. First, it simplifies the MLIR code by getting rid of dead code and turning what originally looks like a python class definition into a single function. Second, it does a few nice transformations, such as decomposing functions, making everything have value-semantics, and propagate shape and dtype information. You technically don't need to run that pass to turn the MLIR generated by the importer into other MLIR, but it will make things significantly easier for you, since you'll be dealing with simpler MLIR that has more useful information in it. In regards to torch.vtensor vs torch.tensor, the v stands for value-semantics. When a tensor has value semantics, the tensor is immutable. Therefore, there is no way to overwrite the contents of a torch.vtensor in place. To modify the contents of a torch.vtensor, you have to make a new copy of it. On the other hand, the torch.tensor does allow overwrites in place. Because it is easier to reason about graph transformations when tensors have value semantics (and some backends such as linalg require all tensors to satisfy this), torch-MLIR will first convert the graph to one with only torch.vtensors in it, getting rid of any overwrites and torch.tensors along the way. In general, if you see any torch.tensors after running createTorchScriptModuleToTorchBackendPipeline, this is likely a sign that there is something in your IR that is either unsupported or there is a bug in one of the torch-mlir passes. Hope this helps! Let me know if you have any other questions.

hi @ramiro050 , i'm wondering how to modify the contents of a torch.vtensor. For example, how to modify it's shape. Is there any example to refer?

hi @ramiro050 , please take a look. thanks a lot!

ramiro050 commented 2 years ago

hi @ramiro050 , i'm wondering how to modify the contents of a torch.vtensor. For example, how to modify it's shape. Is there any example to refer?

Hi @Tengxu-Sun, it depends on what you mean by modifying the shape. For example, if you're working on a decomposition, and you want your tensor to have a different shape, you can create ops like AtenReshapeOp, and give the tensor a new shape. If you mean rewriting the type of the tensor, you can do that by using tensor.setType(newType) (https://mlir.llvm.org/doxygen/classmlir_1_1Value.html#a438414805708e11d4a4b72c0deb8cba6), where the new type is created using ValueTensorType::get(...). Let me know if this answers your question.

Tengxu-Sun commented 2 years ago

Hi @ramiro050, thanks very much for your help, i finally solve my problem with ValueTensorType.getWithSizesAndDtype() as shown below.

void MyOp::build(::mlir::OpBuilder& odsBuilder, ::mlir::OperationState& odsState, Value input0, Value input1,  Value weight0, Value weight1){
    odsState.addOperands(input0);
    odsState.addOperands(input1);
    odsState.addOperands(weight0);
    odsState.addOperands(weight1);

    auto inputTy_cast = input0.getType().cast<BaseTensorType>();
    auto weightTy_cast = weight0.getType().cast<BaseTensorType>();
    auto input_shape = inputTy_cast.getSizes();
    auto weight_shape = weightTy_cast.getSizes();

    ArrayRef<int64_t> out0_shape = {input_shape[0], input_shape[1], weight_shape[1]}; 
    auto output0_type= inputTy_cast.getWithSizesAndDtype(out0_shape, inputTy_cast.getDtype());
    ArrayRef<int64_t> out1_shape = {input_shape[0]};
    auto output1_type= inputTy_cast.getWithSizesAndDtype(out1_shape, inputTy_cast.getDtype());

    odsState.addTypes(output0_type);
    odsState.addTypes(output1_type);

But i still have some doubts.

1) I try to dump input0 type by

    auto inputTy = input0.getType();
    inputTy.dump();

, and get !torch.vtensor<[1,512],si32> log. So i think inputTy is a !torch.vtensor. But when i try to get its shape by auto input_shape = inputTy.getSizes(); directly. I get an error.

***/lib/Dialect/Torch/IR/TorchOpsODSGenerated.cpp:72:30: error: no member named 'getSizes' in 'mlir::Type' auto input_shape = inputTy.getSizes();
                     ~~~~~~~ ^ 
1 error generated. 
ninja: build stopped: subcommand failed.

So how does this happen? The former inputTy.dump(); indicates its a !torch.vtensor while the later error indicates its a mlir::Type. In my opinion, the !torch.vtensor type is a subclass o f mlir::Type, and the getType() method return a base class type?

If inputTy is a !torch.vtensor, why the operation auto input_shape = inputTy.getSizes(); returns error, and i must cast it to BaseTensorType manually so that it can work. it seems !torch.vtensor is a subclass of BaseTensorType. the error looks wired.

2) In the first, i attempt to cast the input Value input0 to RankedTensorType to get its shape. While got a <<NULL TYPE>> value and lead to some errors. So the Value input0 can't be cast into RankedTensorType?

    auto input_cast_type = input0.getType().dyn_cast<RankedTensorType>();
    input_cast_type.dump();  // this print `<<NULL TYPE>>`
    ArrayRef<int64_t> input_shape = input_cast_type.getShape();

3) According to your comment above, i can create a new type by ValueTensorType::get(...). I find similar situation at https://github.com/llvm/torch-mlir/blob/227dea7b2e71725af5aafca4c556c95c969a48a8/lib/CAPI/TorchTypes.cpp#L216. It seems the third parameter should be a ElementType. However, i can't use the input_cast_type.getElementType() method since i failed cast the Value input0 to a RankedTensorType as described above. Is there any method i can create a new type by ValueTensorType::get(...) here?

ramiro050 commented 2 years ago

You're right, input0.getType() does return an mlir::Type always independent of the actual type of the Value (https://mlir.llvm.org/doxygen/classmlir_1_1Value.html#a5348fc13d5201e2adf7ded6b4b2fb1ad). And yes, BaseTensorType and ValueTensorType are subclasses of mlir::Type. (ValueTensorType is also a subclass of BaseTensorType). In order to access the getSizes() method, you just need to cast, as you did in your solution.
In the torch dialect, we have two tensor types, NonValueTensorTypes and ValueTensorTypes, both of which are subclasses of BaseTensorType. Once you get to the linalg-on-tensors level, the types used for tensors are the builtin tensor types in MLIR (https://mlir.llvm.org/doxygen/classmlir_1_1TensorType.html). RankedTensorType is a variant of this built-in tensor type. Therefore, casting to RankedTensorType will not work on a BaseTensorType.
To get the element type of a BaseTensorType, you have to use getDtype. You can find more information about the properties of the torch tensor types in their declaration here:

https://github.com/llvm/torch-mlir/blob/df0b1e77a475cffcd758c99efea7f9ba64ae060a/include/torch-mlir/Dialect/Torch/IR/TorchTypes.td#L144-L149

I hope this answers your questions. Let me know if I missed anything!

Tengxu-Sun commented 2 years ago

You're right, input0.getType() does return an mlir::Type always independent of the actual type of the Value (https://mlir.llvm.org/doxygen/classmlir_1_1Value.html#a5348fc13d5201e2adf7ded6b4b2fb1ad). And yes, BaseTensorType and ValueTensorType are subclasses of mlir::Type. (ValueTensorType is also a subclass of BaseTensorType). In order to access the getSizes() method, you just need to cast, as you did in your solution.

In the torch dialect, we have two tensor types, NonValueTensorTypes and ValueTensorTypes, both of which are subclasses of BaseTensorType. Once you get to the linalg-on-tensors level, the types used for tensors are the builtin tensor types in MLIR (https://mlir.llvm.org/doxygen/classmlir_1_1TensorType.html). RankedTensorType is a variant of this built-in tensor type. Therefore, casting to RankedTensorType will not work on a BaseTensorType.

To get the element type of a BaseTensorType, you have to use getDtype. You can find more information about the properties of the torch tensor types in their declaration here:

https://github.com/llvm/torch-mlir/blob/df0b1e77a475cffcd758c99efea7f9ba64ae060a/include/torch-mlir/Dialect/Torch/IR/TorchTypes.td#L144-L149

I hope this answers your questions. Let me know if I missed anything!

Thanks very much for your reply. In question 2, how to understand the linalg-on-tensors level. Is it like BaseTensorType and its subclass?

In the code of input0.getType().dyn_cast<RankedTensorType>();, input0.getType() returns an mlir::Type. Personally, i think your answer "RankedTensorType is a variant of this built-in tensor type" means RankedTensorType is a variant of this mlir::Type. So why not the casting work?

ramiro050 commented 2 years ago

Thanks very much for your reply. In question 2, how to understand the linalg-on-tensors level. Is it like BaseTensorType and its subclass?

In the code of input0.getType().dyn_cast<RankedTensorType>();, input0.getType() returns an mlir::Type. Personally, i think your answer "RankedTensorType is a variant of this built-in tensor type" means RankedTensorType is a variant of this mlir::Type. So why not the casting work?

We call it linalg-on-tensors because once we get to that point, everything in the IR comes from the linalg and tensor dialects of MLIR. You can find information about RankedTensorType here (https://mlir.llvm.org/docs/Dialects/Builtin/#rankedtensortype), and info on the dialects here (https://mlir.llvm.org/docs/Dialects/TensorOps/, https://mlir.llvm.org/docs/Dialects/Linalg/).
RankedTensorType is a subclass of TensorType, which is a subclass of mlir::Type. Therefore, something being a RankedTensorType implies that it can also be casted as mlir::Type, but the other direction is in general not true. If something is an mlir::Type, there is no guarantee that it is a RankedTensorType. In your case, the input has a type that is a subtype of BaseTensorType, so the cast to a different type from a different dialect (RankedTensorType) fails.

Tengxu-Sun commented 2 years ago

Thanks very much for your reply. In question 2, how to understand the linalg-on-tensors level. Is it like BaseTensorType and its subclass? In the code of input0.getType().dyn_cast<RankedTensorType>();, input0.getType() returns an mlir::Type. Personally, i think your answer "RankedTensorType is a variant of this built-in tensor type" means RankedTensorType is a variant of this mlir::Type. So why not the casting work?

We call it linalg-on-tensors because once we get to that point, everything in the IR comes from the linalg and tensor dialects of MLIR. You can find information about RankedTensorType here (https://mlir.llvm.org/docs/Dialects/Builtin/#rankedtensortype), and info on the dialects here (https://mlir.llvm.org/docs/Dialects/TensorOps/, https://mlir.llvm.org/docs/Dialects/Linalg/).

RankedTensorType is a subclass of TensorType, which is a subclass of mlir::Type. Therefore, something being a RankedTensorType implies that it can also be casted as mlir::Type, but the other direction is in general not true. If something is an mlir::Type, there is no guarantee that it is a RankedTensorType. In your case, the input has a type that is a subtype of BaseTensorType, so the cast to a different type from a different dialect (RankedTensorType) fails.

Got it, thanks a lot! 👍

silvasean commented 2 years ago

It looks like we found a solution here! Closing!