tenstorrent / tt-buda

Tenstorrent TT-BUDA Repository
Other
162 stars 21 forks source link

problems starting tt-buda-demos #23

Closed nikitaromanoov closed 2 months ago

nikitaromanoov commented 2 months ago

Hello! We get the following error when running this code:

python tt-buda-demos/model_demos/nlp_demos/gpt2/gpt2_text_generation.py

Devices: n300

Also, is there any way to look at how much memory the model uses during startup?

2024-05-07 11:26:33.437 | INFO     | Runtime         - Running tt_runtime on host: 'tt-test'
2024-05-07 11:26:33.437 | INFO     | PerfInfra       - Backend profiler is disabled
2024-05-07 11:26:33.437 | INFO     | PerfInfra       - Memory profiler is enabled
2024-05-07 11:26:33.707 | WARNING  | Runtime         - Config.soc_descriptor_path='/tmp/ecoblox/1c303f8cd67f/device_descs/wormhole_b0_2560_0x0.yaml' doesn't exist, defaulting to '/home/ecoblox/test_nik/tt-buda/third_party/budabackend/device/wormhole_b0_8x10.yaml'
2024-05-07 11:26:33.719 | INFO     | SiliconDriver   - Detected 4 PCI devices : {0, 1, 2, 3}
2024-05-07 11:26:33.721 | WARNING  | SiliconDriver   - NumHostMemChannels: 1 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-05-07 11:26:34.362 | INFO     | Runtime         - Compiling Firmware for TT device
2024-05-07 11:26:34.378 | FATAL    | Always          - Error "/home/ecoblox/test_nik/tt-buda/third_party/budabackend/build/src/firmware/riscv/targets/erisc_app/out//erisc_app.hex" doesn't exist
2024-05-07 11:26:35.969 | FATAL    | Always          - Running pipegen command failed: Core (chip=0,x=25,y=18) (logical location: (chip=0,x=7,y=0)) is out of data buffers memory (allocated 906944 bytes out of available 839680 bytes).
Allocated stream buffers:
Core:
        chip: 0
        r: 0
        c: 7
        op_name: matmul_6
                buffer_Unpacker_0:
                        buffer_size: 224640

                buffer_Unpacker_2:
                        prefetch_type: POST_TM
                        buffer_size: 16640

                buffer_Unpacker_1:
                        prefetch_type: POST_TM
                        buffer_size: 399360

                buffer_Packer:
                        buffer_size: 266240

2024-05-07 11:26:36.002 | INFO     | Runtime         - Compile result: FAILURE, failure_type: undefined, device_id: 0, temporal_epoch_id: 0, graph: , logical_core_x: 0, logical_core_y: 0, failure_target:
2024-05-07 11:26:36.002 | ERROR    | Always          - Error "/home/ecoblox/test_nik/tt-buda/third_party/budabackend/build/src/firmware/riscv/targets/erisc_app/out//erisc_app.hex" doesn't exist
backtrace:
 --- /home/ecoblox/test_nik/tt-buda/pybuda/pybuda/../../third_party/budabackend/build/lib/libtt.so(+0x244d11) [0x7fc0cfa5fd11]
 --- tt::generate_all_fw(tt::netlist_workload_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, perf::PerfDesc&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
 --- /home/ecoblox/test_nik/tt-buda/pybuda/pybuda/../../third_party/budabackend/build/lib/libtt.so(+0x6a71da) [0x7fc0cfec21da]
 --- /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7fc1ade1adf4]
 --- /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fc1c6794609]
 --- /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fc1c68ce353]

2024-05-07 11:26:36.002 | INFO     | pybuda.backend:__init__:85 - Backend compile False, target: , error type: BackendCompileFailure.Invalid, error: Error "/home/ecoblox/test_nik/tt-buda/third_party/budabackend/build/src/firmware/riscv/targets/erisc_app/out//erisc_app.hex" doesn't exist
backtrace:
 --- /home/ecoblox/test_nik/tt-buda/pybuda/pybuda/../../third_party/budabackend/build/lib/libtt.so(+0x244d11) [0x7fc0cfa5fd11]
 --- tt::generate_all_fw(tt::netlist_workload_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, perf::PerfDesc&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
 --- /home/ecoblox/test_nik/tt-buda/pybuda/pybuda/../../third_party/budabackend/build/lib/libtt.so(+0x6a71da) [0x7fc0cfec21da]
 --- /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7fc1ade1adf4]
 --- /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fc1c6794609]
 --- /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fc1c68ce353]

target chip id: 0, target core(x,y): 0 0, temporal epoch id: 0
requires extra size bytes: 0

2024-05-07 11:26:36.003 | FATAL    | Always          - Attempting to read device aiclk when cluster device state is set to idle
2024-05-07 11:26:36.008 | ERROR    | Always          - Attempting to read device aiclk when cluster device state is set to idle
backtrace:
 --- tt_cluster::get_all_device_aiclks()
 --- tt_runtime::query_all_device_aiclks(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
 --- tt_runtime::finish()
 --- /home/ecoblox/test_nik/tt-buda/pybuda/pybuda/_C.so(+0x8b38af) [0x7fc0d0b3c8af]
 --- /home/ecoblox/test_nik/tt-buda/pybuda/pybuda/_C.so(+0x2c1e36) [0x7fc0d054ae36]
 --- python(PyCFunction_Call+0x59) [0x5d5499]
 --- python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- python() [0x4e22b3]
 --- python(_PyEval_EvalFrameDefault+0x5d69) [0x54c8a9]
 --- python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x579c7d]
 --- python(_PyObject_MakeTpCall+0x1ff) [0x5d5fcf]
 --- python(_PyEval_EvalFrameDefault+0x5f18) [0x54ca58]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x725) [0x547265]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e1b5c]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x57a4af]
 --- python(PyObject_Call+0x25e) [0x5d4e0e]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x4e2079]
 --- python(PyObject_Call+0x62) [0x5d4c12]
 --- python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- python() [0x57a4af]
 --- python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- python(_PyEval_EvalFrameDefault+0x690a) [0x54d44a]
 --- python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- python(_PyEval_EvalFrameDefault+0x725) [0x547265]
 --- python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- python(PyEval_EvalCode+0x27) [0x684327]

2024-05-07 11:26:36.017 | INFO     | Debuda          - Debug server ended on
2024-05-07 11:26:36.028 | WARNING  | pybuda.compile:handle_backend_error:1167 - Compile failed, retrying compilation with different parameters. Retry count: 1
2024-05-07 11:26:36.044 | ERROR    | pybuda.device:run_next_command:469 - Compile error: Backend compile failed: BackendCompileFailure.Invalid
Traceback (most recent call last):
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/ttdevice.py", line 832, in compile_for
    self.backend_api = BackendAPI(
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/backend.py", line 89, in __init__
    raise BackendCompileException(self.compile_result)
pybuda.backend.BackendCompileException: <pybuda._C.backend_api.BackendCompileResult object at 0x7fbf63345fb0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/device.py", line 458, in run_next_command
    ret = self.compile_for(
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/ttdevice.py", line 858, in compile_for
    raise RuntimeError(f"Backend compile failed: {ex.compile_result.failure_type}")
RuntimeError: Backend compile failed: BackendCompileFailure.Invalid

Traceback (most recent call last):
  File "gpt2_text_generation.py", line 55, in <module>
    run_gpt2_text_gen()
  File "gpt2_text_generation.py", line 38, in run_gpt2_text_gen
    answer = text_generator(
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 208, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1140, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1147, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1046, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/pipelines/text_generation.py", line 271, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/generation/utils.py", line 1789, in generate
    return self.beam_sample(
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/transformers/generation/utils.py", line 3417, in beam_sample
    outputs = self(
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ecoblox/test_nik/tt-buda/build/python_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/transformers/pipeline.py", line 108, in first_forward
    out = self.tt_forward(*inputs, **self.ordered_kwargs)
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/transformers/pipeline.py", line 63, in tt_forward
    output_q = pybuda.run_inference(_sequential=True)
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/run/api.py", line 90, in run_inference
    return _run_inference(module, inputs, input_count, output_queue, _sequential, _perf_trace, _verify_cfg)
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/run/impl.py", line 277, in _run_inference
    return _run_devices_inference(
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/run/impl.py", line 467, in _run_devices_inference
    output_queue = _initialize_pipeline(False, output_queue, sequential=sequential, verify_cfg=verify_cfg)
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/run/impl.py", line 414, in _initialize_pipeline
    _compile_devices(sequential, training=training, sample_inputs=sample_inputs, sample_targets=sample_targets, microbatch_count=microbatch_count, verify_cfg=verify_cfg)
  File "/home/ecoblox/test_nik/tt-buda/pybuda/pybuda/run/impl.py", line 1248, in _compile_devices
    raise ret
RuntimeError: Backend compile failed: BackendCompileFailure.Invalid
2024-05-07 11:26:36.081 | DEBUG    | pybuda.run.impl:_shutdown:1265 - PyBuda shutdown
2024-05-07 11:26:36.081 | DEBUG    | pybuda.device:run_next_command:419 - Received SHUTDOWN command on CPUDevice 'cpu0_fallback'
2024-05-07 11:26:36.081 | DEBUG    | pybuda.device:run_next_command:424 - Shutting down on CPUDevice 'cpu0_fallback'
2024-05-07 11:26:36.081 | DEBUG    | pybuda.device:run_next_command:419 - Received SHUTDOWN command on TTDevice 'tt0'
2024-05-07 11:26:36.081 | DEBUG    | pybuda.device:run_next_command:424 - Shutting down on TTDevice 'tt0'
2024-05-07 11:26:36.092 | DEBUG    | pybuda.device:run_next_command:419 - Received SHUTDOWN command on CPUDevice 'cpu2_fallback'
2024-05-07 11:26:36.095 | DEBUG    | pybuda.device:run_next_command:424 - Shutting down on CPUDevice 'cpu2_fallback'
2024-05-07 11:26:36.095 | DEBUG    | pybuda.run.impl:_shutdown:1281 - Waiting until processes done
tt-mpantic commented 2 months ago

Hi nikitaromanoov,

Thanks for trying out n300 and reaching out.

tt-buda-demos are primarily intended and extensively tested for e75, e150 and n150 configurations. There are some architectural differences between e75, e150, n150 and n300 cards that cause different compile behaviour hence expose this specific issue.

Saying that, even though n300 is not architecture that we, at the moment, extensively test, from BUDA compiler stack perspective, there are no architectural limitations for all models running on n150 to also run on n300 cards.

Based on above, if you are just trying model demo tests with n300 please have in mind that, at the moment, model demos target n150 architecture and that n300 will be fully supported in some of further releases. In case you are preparing this specific model for some production usage we can definitely assist with providing workaround and long term compiler upgrade with higher priority.

tt-mpantic commented 2 months ago

Since there are no follow ups after my update, I'll go ahead and close this issue. Please feel free to reach out in case further assistance is needed.