iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.83k stars 611 forks source link

Llama-3.1-70B q4_1 compiles, but fails at runtime stream_tracing #18436

Open aviator19941 opened 2 months ago

aviator19941 commented 2 months ago

What happened?

ROCR_VISIBLE_DEVICES=0 ../iree-build-trace/tools/iree-run-module --device=hip://0 --hip_use_streams=true --hip_allow_inline_execution=true --device_allocator=caching --module=./Llama-3.1-70B-q4_1.vmfb --parameters=model=../Llama-3.1-70B-q4_1.irpa --function=prefill_bs4 --input=4x16xsi64 --input=4xsi64 --input=4x1xsi64 --input=128x2621440xf16

iree-run-module: iree/runtime/src/iree/hal/utils/stream_tracing.c:380: uint16_t iree_hal_stream_tracing_context_insert_query(iree_hal_stream_tracing_context_t *, iree_hal_stream_tracing_context_event_list_t *): Assertion `event->next_in_command_buffer != ((void*)0)' failed.
Aborted (core dumped)

Steps to reproduce your issue

  1. Download and unzip MLIR from https://github.com/aviator19941/IR/tree/main/llama3_70b
  2. Compile unzipped IR with this command: ../iree-build-trace/tools/iree-compile --iree-hal-target-backends=rocm --iree-hip-target=gfx942 --iree-hal-executable-debug-level=3 ../sharktank/Llama-3.1-70B-q4_1.mlir -o Llama-3.1-70B-q4_1.vmfb
  3. Run vmfb with this command (or zero/splat weights): ROCR_VISIBLE_DEVICES=0 ../iree-build-trace/tools/iree-run-module --device=hip://0 --hip_use_streams=true --hip_allow_inline_execution=true --device_allocator=caching --module=./Llama-3.1-70B-q4_1-trace.vmfb --parameters=model=../Llama-3.1-70B-q4_1.irpa --function=prefill_bs4 --input=4x16xsi64 --input=4xsi64 --input=4x1xsi64 --input=128x2621440xf16
  4. See error:
    iree-run-module: iree/runtime/src/iree/hal/utils/stream_tracing.c:380: uint16_t iree_hal_stream_tracing_context_insert_query(iree_hal_stream_tracing_context_t *, iree_hal_stream_tracing_context_event_list_t *): Assertion `event->next_in_command_buffer != ((void*)0)' failed.
    Aborted (core dumped)

What component(s) does this issue relate to?

Runtime

Version information

0242f6dfdab168e6a661162b739c08488958ec77

Additional context

No response

aviator19941 commented 2 months ago

Enlarging the number from https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/utils/stream_tracing.c#L14 to (32 * 1024 - 256) allows the trace to get generated without error.

aviator19941 commented 2 months ago

Looking into the number of dispatches/stats of the tracy profile to see if there's something much different going on with 70B vs 8B.

aviator19941 commented 1 month ago

Seems like there are ~14.6k command buffers that are intializer dispatches of slow_memcpy's, and more command buffers for the dispatches from prefill/decode.