Open pschang-phy opened 2 weeks ago
I suspect this is the case where ttnn.moreh_cumsum
expects the input tensor needs to be 4D. So we will need to unsqueeze the input tensor to 4D first during conversion (not sure why it isn't done in ttnn library) then squeeze back the result. For example the input shape (1, 1, 1, 32)
with dim=3
is accepted by the cumsum.
But I found that the output is still incorrect. For example with a tensor (1, 1, 1, 32)
filled with 1.0
, it outputs
TorchTensor([[[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]]]],
dtype=torch.bfloat16)
while the expected output should be:
tensor([[[[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14.,
15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28.,
29., 30., 31., 32.]]]], dtype=torch.bfloat16)
I see the problem. The current morch_cumsum doesn't support reduction other than first two dims (dim = 0, 1): msum_program_factory.cpp
Here we need dim = 3
@jerrysky3 , noted. Will discuss with the maintainer and follow up. Do you meanwhile plan to overcome this with permute or will blocklist one input?
Also, I can't find where we need dim = 3, not reflected in https://github.com/tenstorrent/pytorch2.0_ttnn/blob/main/docs/operations/aten.cumsum.default.md
@jerrysky3 , noted. Will discuss with the maintainer and follow up. Do you meanwhile plan to overcome this with permute or will blocklist one input?
Also, I can't find where we need dim = 3, not reflected in https://github.com/tenstorrent/pytorch2.0_ttnn/blob/main/docs/operations/aten.cumsum.default.md
Thanks. The bloom needs cumsum on (1, 32) with dim = 1 and moreh_cumsum only works with 4D tensor. So ideally we can unsqueeze the tensor into (1, 1, 1, 32) and cumsum on dim = 3 and squeeze it back to (1, 32). Sorry I didn't explain clear enough in the comments.
I also filed the feature request https://github.com/tenstorrent/tt-metal/issues/14549 for cumsum on dim = 2, 3 (H, W). Meanwhile I think it's possible to unsqueeze the tensor into (1, 32, 1, 1) and cumsum on dim = 1 then squeeze back. It's not ideal because the tensor becomes 32x32 time larger (with tile layout) but might be a workaround for now (as these tensors are small)
The maintainer won't be able to update the implementation so I see 3 options:
@jerrysky3 wdyt?
The maintainer won't be able to update the implementation so I see 3 options:
- Proper implementation in op/kernel;
- Workaround as described above but in TT-NN code2;
- Workaround as described above in the compiler (in this code base);
@jerrysky3 wdyt?
The workaround has been implmeneted in the compiler (this code base) and merged in https://github.com/tenstorrent/pytorch2.0_ttnn/pull/370. Personally I think it is fine for now and we should aim for a proper fix in op/kernel when there is a time cycle
Description
When lowering torch.ops.aten.cumsum.default to ttnn.moreh_cumsum, got runtime issue.
Error message
Reproduce
In pytorch2.0_ttnn, checkout to branch
origin/fix-e2e-model/bloom
cc @ayerofieiev-tt @jerrysky3 @jdh8