Closed seunghwan100 closed 5 months ago
Hey guys, this is currently supported from our python API. So, at the very least we know we can do it.
There are some constraints such as:
We are currently in the process of migrating this logic into C++
@chekangliang what is the priority of this task?
@arakhmati Yes. We need this feature in the C++ level. I assume this has pretty high priority because it is required for GroupNorm which is used in GPT-2 training.
@razorback3 can you please give me a list of operations where you need this supported right now?
Obviously, group norm but I am assuming the operations feeding it and consuming from it also need to be updated. So, I would like to have a list to unblock you right away
@arakhmati , the priority is P1, but we need to scope this properly in term of ops.
@razorback3, can you please answer the questions from @arakhmati?
So, I would like to have a list of [ops] to unblock you right away
@arakhmati @jliangTT Currently, to the best of my knowledge, groupnorm is the only OP that requires this feature.
However, I am hard to understand why you are asking for the list of ops that require this feature. Can you explain more about how you are going to support this feature? Then, I can give you more information that you would need. I first thought that the change could be easily made in the Tensor class to store user-given shape information.
@arakhmati @jliangTT Currently, to the best of my knowledge, groupnorm is the only OP that requires this feature.
However, I am hard to understand why you are asking for the list of ops that require this feature. Can you explain more about how you are going to support this feature? Then, I can give you more information that you would need. I first thought that the change could be easily made in the Tensor class to store user-given shape information.
@razorback3 the Shape
class can support shapes of rank 1 to 8 already. We just never updated the ops to be N-D in C++
But as of now you can use this constructor: Shape(const std::vector<uint32_t>& dimensions)
So the feature is there. We just need to update ops. We will eventually update all of them.
But what I am asking is what ops do you need updated right away?
OK. Thanks @arakhmati. Let me summarize.
Shape class in tt-eager supports 8-D and its constructor allows a dimension size less than 8-D. This is good. However, some host functions assume the tensor is always 4-D.
For example, the below code causes a runtime error because the input shape is 3-D. In order to make the below code work, the user needs to input the shape as {1, 3, 32, 32}.
Shape shape = {3, 32, 32};
Tensor a = tt::numpy::random::random(shape);
a = a.to(Layout::TILE); // <----- ERROR
a = a.to(device);
So, I think allowing to(...) utility functions to work for non-4-D tensor will solve all the problems. Hope this information is enough for you.
Thanks.
convert_layout_row_major_to_tile, convert_layout_tile_to_row_major, Tensor::pad_to_tile and Tensor::unpad_from_tile assume that the tensor shape is 4D.
OK. Thanks @arakhmati. Let me summarize.
Shape class in tt-eager supports 8-D and its constructor allows a dimension size less than 8-D. This is good. However, some host functions assume the tensor is always 4-D.
For example, the below code causes a runtime error because the input shape is 3-D. In order to make the below code work, the user needs to input the shape as {1, 3, 32, 32}.
Shape shape = {3, 32, 32}; Tensor a = tt::numpy::random::random(shape); a = a.to(Layout::TILE); // <----- ERROR a = a.to(device);
So, I think allowing to(...) utility functions to work for non-4-D tensor will solve all the problems. Hope this information is enough for you.
Thanks.
Yes, that's exactly right. We simply need to update the logic in the functions. The types already support N-D.
FYI, we support negative indexing into the shape like in python.
i.e.: shape[-1]
gives you the last dimension.
We simply need to update the logic in the functions.
Do you have a plan for this?
We simply need to update the logic in the functions.
Do you have a plan for this?
It's on the roadmap but we don't have concrete dates for this yet
I created a issue https://github.com/tenstorrent-metal/tt-metal/issues/6480. So, I closed this issue.
This was marked as blocking groupNorm. Are you unblocked to implemented groupNorm?
The problem is summarized as follow: https://github.com/tenstorrent-metal/tt-metal/issues/5335#issuecomment-1959126844
and the answer from Akhmed: https://github.com/tenstorrent-metal/tt-metal/issues/5335#issuecomment-1965307078
Answer to the question from Jason:
This was marked as blocking groupNorm. Are you unblocked to implemented groupNorm?
No. We are waiting for the implementation that Akhmed mentioned above.
@razorback3 yes that is my understanding here. It confused me when this got closed.
@razorback3 , can you add a unit test that @arakhmati can implement against?
https://github.com/tenstorrent-metal/tt-metal/tree/jaehoon/test-tensor-ranks Here is the branch that contains test code (@jliangTT, @arakhmati).
assigning to @arakhmati as discussed in our previous conservation now that we have the repro. what is the t-shirt size? can we turn this around fairly fast Akhmed?
promiting this to p0 following a discussion with @davorchap / @jvasilje . @arakhmati , can you prioritize accordingly?
promiting this to p0 following a discussion with @davorchap / @jvasilje . @arakhmati , can you prioritize accordingly?
Yes, will do
I merged to main. pad_to_tile
now works for any rank between 1 and 8 (inclusive on both ends). I am going to close this issue
Thanks @arakhmati. I can now make 5D~8D tensors in C++ level, but I encountered error in python code.
import torch
import tt_lib
import tt_lib as ttl
torch.manual_seed(10)
shape = (3,2,2,32,32)
x = torch.randint(low=0, high=4, size=shape).to(torch.bfloat16)
dev_x = ttl.tensor.Tensor(x, ttl.tensor.DataType.BFLOAT16).to(ttl.tensor.Layout.TILE).to(device)
E RuntimeError: TT_THROW @ tt_eager/tensor/tensor_impl.hpp:177: tt::exception
E info:
E Rank {} is not supported!
E 5
Ok. i synced with @arakhmati offline on this and i think the initial scope was a little fuzzy, and misalignment happened.
I feel like we have gap in the scope between what is written vs. what is expected. Help us going forward, we probably need some explict enumeration of the functions that require supports.
In this case, can Moreh open up another issue to enumerate the methods so we can knock them out w/t unit test? I can continue to prioritize them as p0 if necessary.
There is a issue.
@seunghwan100 , sorry i read through the issue and it is not sufficiently descriptive to help us capture the detailed requirement. It is missing information that we have requested in my last message, i.e. can Moreh open up another issue to enumerate the methods so we can knock them out w/t unit test?
I am sure it is as frustrating for Moreh to receive deliverables that don't meet your need as it is for TT who deliver on this without getting clear requirement. Getting better at writing explicit requirement is an area that will help the partnership getting better at serving each other.
@razorback3.
Absolutely no problem. I understand the situation. Will open a new issue soon. Thanks.
@jliangTT @arakhmati
Opened a new issue with more concrete requirements and unit tests: https://github.com/tenstorrent/tt-metal/issues/8149
In tt-metal, all tensors must currently be aligned to a 4D shape. For example, if the original shape of the tensor is (4, 8, 32), it should be transformed into (1, 4, 8, 32) to be compatible with tt-tensor.
In GroupNorm, both input and output can be of shape (N, C, ), where can any dimensions(>=0).
So, considering the case where the tt-tensor shape is (1, 4, 4, 4) and num_groups is 4, If the original tensor is 4D, then N=1 and C=4. If it's 3D, then N=4 and C=4. Similarly, if the tt-tensor shape is (1, 1, 4, 4) and num_groups is 4, If the original tensor is 3D, then N=1 and C=4. If it's 2D, then N=4 and C=4. In cases of 5D or higher, N and C cannot be determined.(This problem needs to be resolved in another issue.)
In other words, the rank of the original Tensor cannot be inferred from the shape of tt-tensor and num_groups argument. Therefore, the Tensor class should have information about the rank of the original Tensor.