Closed punithsekar closed 3 months ago
I am unable to reproduce this issue with the test attached.
pytest test_9877.py
2024-07-17 19:38:26.088 | DEBUG | ttnn:
test_9877.py::test[device_params0] ⠁ Initializing Chip
Detecting chips (found 8)
2024-07-17 19:38:26.694 | INFO | SiliconDriver - Detected 4 PCI devices : [0, 1, 2, 3]
2024-07-17 19:38:26.773 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 0)
2024-07-17 19:38:26.774 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 4)
2024-07-17 19:38:26.783 | INFO | SiliconDriver - Detected 4 PCI devices : [0, 1, 2, 3]
2024-07-17 19:38:26.810 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 1)
2024-07-17 19:38:26.811 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 5)
2024-07-17 19:38:26.819 | INFO | SiliconDriver - Detected 4 PCI devices : [0, 1, 2, 3]
2024-07-17 19:38:26.847 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 2)
2024-07-17 19:38:26.848 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 6)
2024-07-17 19:38:26.856 | INFO | SiliconDriver - Detected 4 PCI devices : [0, 1, 2, 3]
2024-07-17 19:38:26.884 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 3)
2024-07-17 19:38:26.885 | INFO | SiliconDriver - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 7)
Always | DEBUG | Initializing firmware
Always | DEBUG | Waiting for firmware init complete
Always | DEBUG | Firmware init complete
Op | DEBUG | Started C++ ttnn operation: ttnn::to_layout
Op | DEBUG | Finished C++ ttnn operation: ttnn::to_layout
Op | DEBUG | Started C++ ttnn operation: ttnn::mish
Op | DEBUG | Started C++ ttnn operation: ttnn::softplus
Op | DEBUG | Launching Operation: "Unary &" (device
==================================================================== PASSES ==================================================================== ============================================================== slowest durations =============================================================== 2.31s setup test_9877.py::test[device_params0] 2.27s call test_9877.py::test[device_params0] 0.01s teardown test_9877.py::test[device_params0] =========================================================== short test summary info ============================================================ PASSED test_9877.py::test[device_params0] ============================================================== 1 passed in 4.60s ===============================================================
@punithsekar can you please try from the main branch? the unit test as well as the whole model graph? Thanks! @dvartaniansTT fyi
Thanks @eyonland, let's check again and confirm here
@punithsekar thanks for creating the issue. lets please follow this guidlines for all issues moving forward:
create a unit test for failure and share the command to run the test.
Indicate which branch and which card you are using. for instance E150 for gs N150 for single chip WH N300 for dual chip Wh ...
the git branch or tag.
your build steps: for instance built from source...
so for me it typically looks like this:
git checkout [v0.50.0](https://github.com/tenstorrent/tt-metal/tree/v0.50.0)
sysytem info: Ubuntu 20 N150 sw commit/branch v0.50.0
then attach a screenshot of the error you are seeing.
// mish[x] = x*tanh[softplus[x]]
// use transformation y = x*tanh[softplus[x]] by broadcast
// Ref: https://krutikabapat.github.io/Swish-Vs-Mish-Latest-Activation-Functions/
Tensor _mish(const Tensor& x, const std::optional<MemoryConfig>& output_mem_config) {
std::vector<Tensor> output_tensors = {Tensor(operation::get_workers_for_op_output({x}))};
operation::launch_op(
[output_mem_config](
const std::vector<Tensor>& input_tensors,
const std::vector<std::optional<const Tensor>>& optional_input_tensors,
const std::vector<std::optional<Tensor>>& optional_output_tensors) mutable -> std::vector<Tensor> {
const auto& x = input_tensors.at(0);
Tensor sp_x = ttnn::softplus(x, 1.0f, 20.0f, output_mem_config);
Tensor tanh_x = ttnn::tanh(sp_x, output_mem_config);
sp_x.deallocate();
Tensor mish_x = ttnn::multiply(x, tanh_x, std::nullopt, output_mem_config);
return {mish_x};
},
{x},
output_tensors);
return output_tensors.at(0);
}
import ttnn
import torch
import pytest
@pytest.mark.parametrize("device_params", [{"l1_small_size": 32768}], indirect=True)
def test(device):
a=torch.randn((1,1,102400,32),dtype=torch.float16)
ttnn_input_tensor = ttnn.from_torch(
a,
dtype=ttnn.bfloat16,
memory_config=ttnn.L1_MEMORY_CONFIG,
device=device,
layout=ttnn.TILE_LAYOUT,
)
output=ttnn.mish(ttnn_input_tensor)
gdb --args python -m pytest test_9877.py
b unary_composite_op.cpp:280
r
Thread 1 "python" hit Breakpoint 1, ttnn::operations::unary::_mish(tt::tt_metal::Tensor const&, std::1::optional
#0 ttnn::operations::unary::_mish(tt::tt_metal::Tensor const&, std::__1::optional<tt::tt_metal::MemoryConfig> const&)::$_0::operator()(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor> > const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const> > > const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor> > > const&) (this=0x5fe2168, input_tensors=..., optional_input_tensors=..., optional_output_tensors=...)
at ../ttnn/cpp/ttnn/operations/eltwise/unary/device/unary_composite_op.cpp:283
#1 0x00007fff8703d6f8 in std::__1::__invoke[abi:ue170006]<ttnn::operations::unary::_mish(tt::tt_metal::Tensor const&, std::__1::optional<tt::tt_metal::MemoryConfig> const&)::$_0&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor> > const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const> > > const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor> > > const&> (__f=..., __args=..., __args=..., __args=...)
at /usr/lib/llvm-17/bin/../include/c++/v1/__type_traits/invoke.h:340
thanks a lot @eyonland !
I'd wait on @punithsekar to provide the details I asked. he might be running from a specific branch and I'm not sure which card he is using.
@mbahnasTT We couldn't reproduce this error on our Wormhole_B0 machine. cc @umadevimcw @eyonland
Hi @dvartaniansTT ,
Even in the latest main I face the same issue.(05ff4d77f7702610c58e9f18ed918f347c0dbfeb on this main)
I use the following build commands,
git submodule foreach 'git lfs fetch --all && git lfs pull'
git submodule update --init --recursive
export ARCH_NAME=grayskull
export TT_METAL_HOME=$(pwd)
export PYTHONPATH=$(pwd)
export TT_METAL_ENV=dev
./build_metal.sh
./create_venv.sh
source python_env/bin/activate
pip install -r ./tests/end_to_end_tests/requirements.txt
I am using instance of GS E150 . I didn't use this op in pipeline as it was making issue, Created a new file and tested the op and shared the snippet.
Steps to reproduce:
pytest models/experimental/yolov4/reference/unittest.py
Additionally, I also checked with one of the colleague's VM instance of GS, even they face the same. In VM instance of WH, the test works fine without any issues.
Thanks.
Hi @mbahnasTT @eyonland @dvartaniansTT , It appears that Mish is functioning properly on WH. The previously mentioned error is only encountered in GS.
Hi @punithsekar , Mish uses Softplus in its implementation which is not available for Grayskull. So Mish will not be available in Grayskull.
cc: @eyonland
@KalaivaniMCW, please update our documentation to reflect that mish is not supported on GS.
Hello all, The reason why Softplus was originally not implemented for Grayskull is due to HW limitations (GS has too few general purpose registers in the SPFU to support this). However, there has been an update to the implementation which reduced the register pressure, so it's possible that we could be able to use it on Grayskull now. There is currently nothing active for adding softplus to GS.
@jvasilje, given this works on WH, should this be implemented on GS? Also, any objection to downgrading to P1?
no need for GS
Describe the bug ttnn.mish fails with trisc1 build failed error.
To Reproduce Steps to reproduce the behavior: Run the following code snippet.
Expected behavior Execution of the operation without any issue.
Screenshots
Please complete the following environment information: