Closed dvartaniansTT closed 2 months ago
note the pre-written prompts test for falcon7b passes. the error happens when providing my own prompt.
PASSING: pytest --disable-warnings -q -s --input-method=json --input-path='models/demos/falcon7b/demo/input_data.json' models/demos/wormhole/falcon7b/demo_wormhole.py::test_demo[user_input0-default_mode_stochastic]
FAILING: pytest --disable-warnings -q -s --input-method=cli --cli-input="Tell me about computer architecture" models/demos/wormhole/falcon7b/demo_wormhole.py::test_demo[user_input0-default_mode_stochastic]
error: `models/demos/wormhole/falcon7b/demo_wormhole.py:37:
models/demos/falcon7b/demo/demo.py:432: in run_falcon_demo_kv tt_logits, kv_cache = tt_FalconCausalLM( python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl return self._call_impl(*args, kwargs) python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1520: in _call_impl return forward_call(*args, *kwargs) models/demos/falcon7b/tt/falcon_causallm.py:98: in forward hidden_states, presents = super().forward( models/demos/falcon7b/tt/falcon_model.py:283: in forward layer_output = layer( python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl return self._call_impl(args, kwargs) python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1520: in _call_impl return forward_call(*args, kwargs) models/demos/falcon7b/tt/falcon_decoder.py:178: in forward attn_outputs = self.self_attn_decode( python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl return self._call_impl(*args, *kwargs) python_env/lib/python3.8/site-packages/torch/nn/modules/module.py:1520: in _call_impl return forward_call(args, kwargs) models/demos/falcon7b/tt/falcon_attention.py:675: in forward self.query_key_value_weights[i],
input_tensor = ttnn.Tensor([[[[ 0.02100, -0.02197, ..., -0.03271, 0.01025], [ 0.00488, 0.00879, ..., 0.00977, 0....0.00977, ..., -0.00903, 0.00537]]]], shape=Shape([1, 1, 4544, 4672]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE) slices = (slice(None, 0, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))
@ttnn.register_operation(
name="ttnn.Tensor.__getitem__",
validate_input_tensors=_getitem_validate_input_tensors,
is_method=True,
golden_function=_golden_function,
)
def __getitem__(input_tensor: ttnn.Tensor, slices) -> ttnn.Tensor:
input_rank = len(input_tensor.shape)
input_layout = input_tensor.layout
if isinstance(slices, int):
slices = (slice(None, slices, None),)
elif isinstance(slices, slice):
slices = (slices,)
elif isinstance(slices, type(...)):
raise RuntimeError("Ellipsis is not supported!")
normalized_slices = []
for s in slices:
if isinstance(s, int):
normalized_slices.append(slice(None, s, None))
elif isinstance(s, slice):
normalized_slices.append(s)
else:
raise RuntimeError("Invalid slice type!")
slices = tuple(normalized_slices)
while len(slices) != input_rank:
slices = slices + (slice(None, None, None),)
if isinstance(slices, tuple):
if len(slices) > input_rank:
raise RuntimeError(f"Too many slices for tensor of rank {input_rank}")
if input_rank <= 4:
input_tensor = ttnn.unsqueeze_to_4D(input_tensor)
while len(slices) != 4:
slices = (slice(None, None, None),) + slices
slice_start = [_slice.start if _slice.start is not None else 0 for _slice in slices]
slice_end = [
(_slice.stop if _slice.stop is not None else input_tensor.shape[index])
for index, _slice in enumerate(slices)
]
padded_slice_end = list(slice_end)
if input_layout == ttnn.TILE_LAYOUT:
padded_slice_end[-1] = int(math.ceil((slice_end[-1]) / ttnn.TILE_SIZE)) * ttnn.TILE_SIZE
padded_slice_end[-2] = int(math.ceil((slice_end[-2]) / ttnn.TILE_SIZE)) * ttnn.TILE_SIZE
if list(padded_slice_end) == list(input_tensor.shape.with_tile_padding()):
output = input_tensor
else:
padded_slice_end_minus_1 = [x - 1 for x in padded_slice_end]
if any([x < 0 for x in padded_slice_end_minus_1]):
raise RuntimeError("ttnn.Tensor.__getitem__: cannot return a scalar!")
E RuntimeError: ttnn.Tensor.getitem: cannot return a scalar!
ttnn/ttnn/operations/core.py:92: RuntimeError ================================================================= short test summary info ================================================================== FAILED models/demos/wormhole/falcon7b/demo_wormhole.py::test_demo[user_input0-default_mode_stochastic] - RuntimeError: ttnn.Tensor.getitem: cannot return a scalar! ========================================================= 1 failed, 1 warning in 368.28s (0:06:08) ========================================================= Device | INFO | Closing user mode device drivers `
mamaba demos pass however the full model unit test shared on the README fails:
test command: pytest -svv models/demos/mamba/tests/test_full_model.py::test_inference[state-spaces/mamba-2.8b-32-None-0.985-64-1]
error: ================================================================== no tests ran in 0.15s =================================================================== ERROR: not found: /home/dvartanians/tt-metal/demo-tests/tt-metal/models/demos/mamba/tests/test_full_model.py::test_inference[state-spaces/mamba-2.8b-32-None-0.985-64-1] (no name '/home/dvartanians/tt-metal/demo-tests/tt-metal/models/demos/mamba/tests/test_full_model.py::test_inference[state-spaces/mamba-2.8b-32-None-0.985-64-1]' in any of [<Module models/demos/mamba/tests/test_full_model.py>])
another issue with mamba this unit test shared on README is currently being skipped. Please update README files to exclude skipped tests or suggest that it's currently skipped (Work in Progress ...)
command/unit test: pytest -svv models/demos/mamba/tests/test_mamba_perf.py -m models_performance_bare_metal
skipped message:
=================================================================== test session starts ====================================================================
platform linux -- Python 3.8.10, pytest-7.2.2, pluggy-1.5.0 -- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3
cachedir: .pytest_cache
rootdir: /home/dvartanians/tt-metal/demo-tests/tt-metal, configfile: pytest.ini
plugins: split-0.8.2, dash-2.15.0, timeout-2.2.0, anyio-4.4.0
timeout: 2400.0s
timeout method: signal
timeout func_only: False
collecting ... 2024-06-04 22:11:06.113 | DEBUG | ttnn:
models/demos/mamba/tests/test_mamba_perf.py::test_mamba_e2e_perf[32-10-12.5-0.4] SKIPPED (Non-deterministic hang on CI (#8606))
================================================================= short test summary info ================================================================== SKIPPED [1] models/demos/mamba/tests/test_mamba_perf.py:22: Non-deterministic hang on CI (#8606) ============================================================= 1 skipped, 1 deselected in 0.49s ============================================================
for mamba, in the README it is mentioned to build with profiler enabled. However the build instructions are updated in main and do not support the profiler and tracy yet.
Please update the instructions in README here to be up to date with the code in main.
I tried to build again following the instructions here and I get:
make clean make: *** No rule to make target 'clean'. Stop.
for Bert large:
in the README instructions provide a more clear example of the demo test running with batch size 7 like:
pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7]
it is currently shared as: pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-BATCH_SIZE]
I had to look inside the code to know it expects lower case batch_# in WH case batch_7.
bert large demo test fails:
command:
pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7]
error:
models/demos/metal_BERT_large_11/demo/demo.py:379:
models/demos/metal_BERT_large_11/demo/demo.py:272: in run_bert_question_and_answering_inference tt_out = tt_bert_model(tt_embedding, tt_attention_mask).cpu() models/demos/metal_BERT_large_11/tt/bert_model.py:147: in call hidden_states = encoder(hidden_states, attention_mask) models/demos/metal_BERT_large_11/tt/bert_encoder.py:203: in call mha_res = self.mha(activation, attention_mask) models/demos/metal_BERT_large_11/tt/mha.py:259: in call result = self.mha(activation, attention_mask) models/demos/metal_BERT_large11/tt/mha.py:149: in mha qkv = op1_qkv_fused(activation, qkv_weight, qkv_bias)
activation = ttnn.Tensor([[[[-0.09766, -0.25781, ..., 0.30469, 0.92969], [-0.15625, 0.15234, ..., 0.73438, -0....-0.34375, ..., -0.76562, -1.29688]]]], shape=Shape([7, 1, 384, 1024]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE) qkv_weight = ttnn.Tensor([[[[ 0.01562, 0.03418, ..., 0.00244, -0.00342], [ 0.03027, -0.04883, ..., 0.01660, -0....0.00977, ..., 0.02832, -0.00195]]]], shape=Shape([1, 1, 1024, 3072]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE) qkv_bias = ttnn.Tensor([[[[ 0.00781, 0.18359, ..., 0.00000, 0.00000], [-0.00391, -0.03516, ..., 0.00000, 0.... -0.00488, ..., 0.00000, 0.00000]]]], shape=Shape([1, 1, 32, 3072]), dtype=DataType::BFLOAT8_B, layout=Layout::TILE)
def op1_qkv_fused(activation, qkv_weight, qkv_bias):
qkv = tt_lib.operations.primary.matmul(
activation, qkv_weight, bias=qkv_bias, program_config=model_config["OP1_FUSED_QKV_MM_CONFIG"], output_mem_config=model_config["OP1_FUSED_QKV_MM_OUTPUT_MEMCFG"], output_dtype=model_config["OP1_FUSED_QKV_MM_OUTPUT_DTYPE"], ) E RuntimeError: TT_FATAL @ ../tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:1162: false E info: E Grid is invalid for mcast matmul! E backtrace:
..... E --- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3(_PyEval_EvalFrameDefault+0x1876) [0x5483b6] E --- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3(_PyFunction_Vectorcall+0x1b6) [0x5d5846] E --- /home/dvartanians/tt-metal/demo-tests/tt-metal/python_env/bin/python3(PyObject_Call+0x62) [0x5d4c12]
models/demos/metal_BERT_large_11/tt/mha.py:27: RuntimeError ================================================================= short test summary info ================================================================== FAILED models/demos/metal_BERT_large_11/demo/demo.py::test_demo[models/demos/metal_BERT_large_11/demo/input_data.json-1-batch_7] - RuntimeError: TT_FATAL @ ../tt_eager/tt_dnn/op_library/bmm/multi_core_reuse_mcast_2d_optimized/bmm_op_multi_core_reuse_mcast_2d_optimized.cpp:1162: false ========================================================= 1 failed, 1 warning in 91.18s (0:01:31) =
Bert large second demo test also fails:
command:
pytest --disable-warnings models/demos/metal_BERT_large_11/demo/demo.py::test_demo_squadv2 -k batch_7
for bert large the README instructions suggest using batch 2-12. However none seems to work. The tests are either skipped for certain batch sizes or generate errors.
For example for batch 12 I get the following error: `E AssertionError: Device grid size does not support batch 12 BFLOAT8_B-SHARDED configuration
models/demos/metal_BERT_large_11/tt/model_config.py:305: AssertionError`
the following in BERT LARGE's README is suggested for generating the perf sheet. However, again with the current build flow in main the profiler tools are not accessible/enabled.
Please update the README's to reflect the latest status of main or at least acknowledge that it is currently not supported but is being worked on. for instance, to generate the perf sheet, you may use the following command in near future when we enable the tools in our new build flow. stay tuned ...
./tt_metal/tools/profiler/profile_this.py -c "pytest --disable-warnings models/demos/metal_BERT_large_11/tests/test_bert.py::test_bert[BERT_LARGE-BATCH_SIZE-BFLOAT8_B-SHARDED]"
this one was also mentioned on the 9092.
Mistral required too much manual work and editing of hard coded code to run. The weights download, renaming, ... should be automated and user should not need to go modify the demo code in order to run a test demo.
stable diffusion demo passes. however, there is a typo in the demo test file at models/demos/wormhole/stable_diffusion/demo/demo.py::test**_interactve_**demo
which cause the suggested pytest in the README to fail: the suggested pytest in the README is pytest models/demos/wormhole/stable_diffusion/demo/demo.py::test_interactive_demo
demo code should be updated to fix test_interactve_demo to test_interactive_demo in the file "models/demos/wormhole/stable_diffusion/demo/demo.py" and line 622.
Thanks to @mtairum , we are officially calling the work for mistral for both N150 and N300 completed.
@kpaigwar @esmalTT to work on mamba feedback
@dvartaniansTT @mbahnasTT The Falcon7b error should be fixed now in main with df84554. I have tested the prompt to verify. Also, it should only happen the first time the model is run, so it will probably work if you re-run the same command.
@dvartaniansTT Note that bert doesn't work for N150, as noted on front-page README.
I will add a note in the bert README. as well to make it even clearer.
@dvartaniansTT @mbahnasTT After we put in the warning about SD on N300, can we close this?
Describe the bug I am testing demos for WH on N150. And encountering errors.
To Reproduce Steps to reproduce the behavior:
Expected behavior demos and any suggested pytest in the models' README should pass from main. And, the README's should be regularly updated to reflect the changes in main, be it about build flow, required env variables or anything else that is changed in main.
Please complete the following environment information:
Additional context I will be adding more to this issue as I keep testing.