N150 WH demos test - Githubissues

dvartaniansTT commented 4 months ago

Describe the bug I am testing demos for WH on N150. And encountering errors.

To Reproduce Steps to reproduce the behavior:

git checkout to the main branch commit: baef03c8a0fff6e10e463c40f9e44e2fdc3d7e0c
see the comments bellow for the test command and corresponding errors.
See errors in the comments on this issue.
Please note, some issues reported on the comments are about discrepancies in README and what is supported in main and suggestion for README clarifications.

Expected behavior demos and any suggested pytest in the models' README should pass from main. And, the README's should be regularly updated to reflect the changes in main, be it about build flow, required env variables or anything else that is changed in main.

Please complete the following environment information:

OS: [e.g. Ubuntu 20.04]
main commit: baef03c8a0fff6e10e463c40f9e44e2fdc3d7e0c
N150

Additional context I will be adding more to this issue as I keep testing.

dvartaniansTT commented 4 months ago

note the pre-written prompts test for falcon7b passes. the error happens when providing my own prompt.

PASSING: pytest --disable-warnings -q -s --input-method=json --input-path='models/demos/falcon7b/demo/input_data.json' models/demos/wormhole/falcon7b/demo_wormhole.py::test_demo[user_input0-default_mode_stochastic]

FAILING: pytest --disable-warnings -q -s --input-method=cli --cli-input="Tell me about computer architecture" models/demos/wormhole/falcon7b/demo_wormhole.py::test_demo[user_input0-default_mode_stochastic]

error: `models/demos/wormhole/falcon7b/demo_wormhole.py:37:

models/demos/falcon7b/demo/demo.py:432: in run_falcon_demo_kv tt_logits, kv_cache = tt_FalconCausalLM( python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl return self._call_impl(*args, kwargs) python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1520: in _call_impl return forward_call(*args, *kwargs) models/demos/falcon7b/tt/falcon_causallm.py:98: in forward hidden_states, presents = super().forward( models/demos/falcon7b/tt/falcon_model.py:283: in forward layer_output = layer( python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl return self._call_impl(args, kwargs) python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1520: in _call_impl return forward_call(*args, kwargs) models/demos/falcon7b/tt/falcon_decoder.py:178: in forward attn_outputs = self.self_attn_decode( python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl return self._call_impl(*args, *kwargs) python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1520: in _call_impl return forward_call(args, kwargs) models/demos/falcon7b/tt/falcon_attention.py:675: in forward self.query_key_value_weights[i],

input_tensor = ttnn.Tensor([[[[ 0.02100, -0.02197, ..., -0.03271, 0.01025], [ 0.00488, 0.00879, ..., 0.00977, 0....0.00977, ..., -0.00903, 0.00537]]]], shape=Shape([1, 1, 4544, 4672]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE) slices = (slice(None, 0, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))

@ttnn.register_operation(
    name="ttnn.Tensor.__getitem__",
    validate_input_tensors=_getitem_validate_input_tensors,
    is_method=True,
    golden_function=_golden_function,
)
def __getitem__(input_tensor: ttnn.Tensor, slices) -> ttnn.Tensor:
    input_rank = len(input_tensor.shape)
    input_layout = input_tensor.layout

    if isinstance(slices, int):
        slices = (slice(None, slices, None),)
    elif isinstance(slices, slice):
        slices = (slices,)
    elif isinstance(slices, type(...)):
        raise RuntimeError("Ellipsis is not supported!")

    normalized_slices = []
    for s in slices:
        if isinstance(s, int):
            normalized_slices.append(slice(None, s, None))
        elif isinstance(s, slice):
            normalized_slices.append(s)
        else:
            raise RuntimeError("Invalid slice type!")
    slices = tuple(normalized_slices)

    while len(slices) != input_rank:
        slices = slices + (slice(None, None, None),)

    if isinstance(slices, tuple):
        if len(slices) > input_rank:
            raise RuntimeError(f"Too many slices for tensor of rank {input_rank}")

    if input_rank <= 4:
        input_tensor = ttnn.unsqueeze_to_4D(input_tensor)

        while len(slices) != 4:
            slices = (slice(None, None, None),) + slices
        slice_start = [_slice.start if _slice.start is not None else 0 for _slice in slices]
        slice_end = [
            (_slice.stop if _slice.stop is not None else input_tensor.shape[index])
            for index, _slice in enumerate(slices)
        ]

        padded_slice_end = list(slice_end)
        if input_layout == ttnn.TILE_LAYOUT:
            padded_slice_end[-1] = int(math.ceil((slice_end[-1]) / ttnn.TILE_SIZE)) * ttnn.TILE_SIZE
            padded_slice_end[-2] = int(math.ceil((slice_end[-2]) / ttnn.TILE_SIZE)) * ttnn.TILE_SIZE

        if list(padded_slice_end) == list(input_tensor.shape.with_tile_padding()):
            output = input_tensor
        else:
            padded_slice_end_minus_1 = [x - 1 for x in padded_slice_end]
            if any([x < 0 for x in padded_slice_end_minus_1]):

              raise RuntimeError("ttnn.Tensor.__getitem__: cannot return a scalar!")
E RuntimeError: ttnn.Tensor.getitem: cannot return a scalar!

ttnn/ttnn/operations/core.py:92: RuntimeError ================================================================= short test summary info ================================================================== FAILED models/demos/wormhole/falcon7b/demo_wormhole.py::test_demo[user_input0-default_mode_stochastic] - RuntimeError: ttnn.Tensor.getitem: cannot return a scalar! ========================================================= 1 failed, 1 warning in 368.28s (0:06:08) ========================================================= Device | INFO | Closing user mode device drivers `

dvartaniansTT commented 4 months ago

mamaba demos pass however the full model unit test shared on the README fails:

test command: pytest -svv models/demos/mamba/tests/test_full_model.py::test_inference[state-spaces/mamba-2.8b-32-None-0.985-64-1]

error: ================================================================== no tests ran in 0.15s =================================================================== ERROR: not found: /home/dvartanians/tt-metal/demo-tests/tt-metal/models/demos/mamba/tests/test_full_model.py::test_inference[state-spaces/mamba-2.8b-32-None-0.985-64-1] (no name '/home/dvartanians/tt-metal/demo-tests/tt-metal/models/demos/mamba/tests/test_full_model.py::test_inference[state-spaces/mamba-2.8b-32-None-0.985-64-1]' in any of [<Module models/demos/mamba/tests/test_full_model.py>])

dvartaniansTT commented 4 months ago

another issue with mamba this unit test shared on README is currently being skipped. Please update README files to exclude skipped tests or suggest that it's currently skipped (Work in Progress ...)

command/unit test: pytest -svv models/demos/mamba/tests/test_mamba_perf.py -m models_performance_bare_metal

skipped message: =================================================================== test session starts ==================================================================== platform linux -- Python 3.8.10, pytest-7.2.2, pluggy-1.5.0 -- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3 cachedir: .pytest_cache rootdir: /home/dvartanians/tt-metal/demo-tests/tt-metal, configfile: pytest.ini plugins: split-0.8.2, dash-2.15.0, timeout-2.2.0, anyio-4.4.0 timeout: 2400.0s timeout method: signal timeout func_only: False collecting ... 2024-06-04 22:11:06.113 | DEBUG | ttnn::139 - Initial ttnn.CONFIG: {'cache_path': PosixPath('/home/dvartanians/.cache/ttnn'), 'comparison_mode_pcc': 0.9999, 'enable_comparison_mode': False, 'enable_detailed_buffer_report': False, 'enable_detailed_tensor_report': False, 'enable_fast_runtime_mode': True, 'enable_graph_report': False, 'enable_logging': False, 'enable_model_cache': False, 'model_cache_path': PosixPath('/home/dvartanians/.cache/ttnn/models'), 'report_name': None, 'root_report_path': PosixPath('generated/ttnn/reports'), 'throw_exception_on_fallback': False, 'tmp_dir': PosixPath('/tmp/ttnn')} collected 2 items / 1 deselected / 1 selected

models/demos/mamba/tests/test_mamba_perf.py::test_mamba_e2e_perf[32-10-12.5-0.4] SKIPPED (Non-deterministic hang on CI (#8606))

================================================================= short test summary info ================================================================== SKIPPED [1] models/demos/mamba/tests/test_mamba_perf.py:22: Non-deterministic hang on CI (#8606) ============================================================= 1 skipped, 1 deselected in 0.49s ============================================================

dvartaniansTT commented 4 months ago

for mamba, in the README it is mentioned to build with profiler enabled. However the build instructions are updated in main and do not support the profiler and tracy yet.

Please update the instructions in README here to be up to date with the code in main.

I tried to build again following the instructions here and I get: make clean make: *** No rule to make target 'clean'. Stop.

dvartaniansTT commented 4 months ago

for Bert large:

in the README instructions provide a more clear example of the demo test running with batch size 7 like:

pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7]

it is currently shared as: pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-BATCH_SIZE]

I had to look inside the code to know it expects lower case batch_# in WH case batch_7.

dvartaniansTT commented 4 months ago

bert large demo test fails:

command: pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7]

error:

models/demos/metal_BERT_large_11/demo/demo.py:379:

models/demos/metal_BERT_large_11/demo/demo.py:272: in run_bert_question_and_answering_inference tt_out = tt_bert_model(tt_embedding, tt_attention_mask).cpu() models/demos/metal_BERT_large_11/tt/bert_model.py:147: in call hidden_states = encoder(hidden_states, attention_mask) models/demos/metal_BERT_large_11/tt/bert_encoder.py:203: in call mha_res = self.mha(activation, attention_mask) models/demos/metal_BERT_large_11/tt/mha.py:259: in call result = self.mha(activation, attention_mask) models/demos/metal_BERT_large11/tt/mha.py:149: in mha qkv = op1_qkv_fused(activation, qkv_weight, qkv_bias)

activation = ttnn.Tensor([[[[-0.09766, -0.25781, ..., 0.30469, 0.92969], [-0.15625, 0.15234, ..., 0.73438, -0....-0.34375, ..., -0.76562, -1.29688]]]], shape=Shape([7, 1, 384, 1024]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE) qkv_weight = ttnn.Tensor([[[[ 0.01562, 0.03418, ..., 0.00244, -0.00342], [ 0.03027, -0.04883, ..., 0.01660, -0....0.00977, ..., 0.02832, -0.00195]]]], shape=Shape([1, 1, 1024, 3072]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE) qkv_bias = ttnn.Tensor([[[[ 0.00781, 0.18359, ..., 0.00000, 0.00000], [-0.00391, -0.03516, ..., 0.00000, 0.... -0.00488, ..., 0.00000, 0.00000]]]], shape=Shape([1, 1, 32, 3072]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE)

def op1_qkv_fused(activation, qkv_weight, qkv_bias):

  qkv = tt_lib.operations.primary.matmul(
activation, qkv_weight, bias=qkv_bias, program_config=model_config["OP1_FUSED_QKV_MM_CONFIG"], output_mem_config=model_config["OP1_FUSED_QKV_MM_OUTPUT_MEMCFG"], output_dtype=model_config["OP1_FUSED_QKV_MM_OUTPUT_DTYPE"], ) E RuntimeError: TT_FATAL @ ../tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:1162: false E info: E Grid is invalid for mcast matmul! E backtrace:

..... E --- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3(_PyEval_EvalFrameDefault+0x1876) [0x5483b6] E --- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3(_PyFunction_Vectorcall+0x1b6) [0x5d5846] E --- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3(PyObject_Call+0x62) [0x5d4c12]

models/demos/metal_BERT_large_11/tt/mha.py:27: RuntimeError ================================================================= short test summary info ================================================================== FAILED models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7] - RuntimeError: TT_FATAL @ ../tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:1162: false ========================================================= 1 failed, 1 warning in 91.18s (0:01:31) =

dvartaniansTT commented 4 months ago

Bert large second demo test also fails:

command: pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo_squadv2 -k batch_7

dvartaniansTT commented 4 months ago

for bert large the README instructions suggest using batch 2-12. However none seems to work. The tests are either skipped for certain batch sizes or generate errors.

For example for batch 12 I get the following error: `E AssertionError: Device grid size does not support batch 12 BFLOAT8_B-SHARDED configuration

models/demos/metal_BERT_large_11/tt/model_config.py:305: AssertionError`

dvartaniansTT commented 4 months ago

the following in BERT LARGE's README is suggested for generating the perf sheet. However, again with the current build flow in main the profiler tools are not accessible/enabled.

Please update the README's to reflect the latest status of main or at least acknowledge that it is currently not supported but is being worked on. for instance, to generate the perf sheet, you may use the following command in near future when we enable the tools in our new build flow. stay tuned ...

./tt_metal/tools/profiler/profile_this.py -c "pytest --disable-warnings models/demos/metal_BERT_large_11/tests/test_bert.py::test_bert[BERT_LARGE-BATCH_SIZE-BFLOAT8_B-SHARDED]"

dvartaniansTT commented 4 months ago

this one was also mentioned on the 9092.

Mistral required too much manual work and editing of hard coded code to run. The weights download, renaming, ... should be automated and user should not need to go modify the demo code in order to run a test demo.

dvartaniansTT commented 4 months ago

stable diffusion demo passes. however, there is a typo in the demo test file at models/demos/wormhole/stable_diffusion/demo/demo.py::test**_interactve_**demo which cause the suggested pytest in the README to fail: the suggested pytest in the README is pytest models/demos/wormhole/stable_diffusion/demo/demo.py::test_interactive_demo

demo code should be updated to fix test_interactve_demo to test_interactive_demo in the file "models/demos/wormhole/stable_diffusion/demo/demo.py" and line 622.

tt-rkim commented 3 months ago

Thanks to @mtairum , we are officially calling the work for mistral for both N150 and N300 completed.

tt-rkim commented 3 months ago

@kpaigwar @esmalTT to work on mamba feedback

skhorasganiTT commented 3 months ago

@dvartaniansTT @mbahnasTT The Falcon7b error should be fixed now in main with df84554. I have tested the prompt to verify. Also, it should only happen the first time the model is run, so it will probably work if you re-run the same command.

tt-rkim commented 3 months ago

@dvartaniansTT Note that bert doesn't work for N150, as noted on front-page README.

I will add a note in the bert README. as well to make it even clearer.

tt-rkim commented 2 months ago

@dvartaniansTT @mbahnasTT After we put in the warning about SD on N300, can we close this?

tenstorrent / tt-metal

N150 WH demos test #9127