Open aviator19941 opened 2 months ago
Enlarging the number from https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/utils/stream_tracing.c#L14 to (32 * 1024 - 256) allows the trace to get generated without error.
Looking into the number of dispatches/stats of the tracy profile to see if there's something much different going on with 70B vs 8B.
Seems like there are ~14.6k command buffers that are intializer dispatches of slow_memcpy's, and more command buffers for the dispatches from prefill/decode.
What happened?
ROCR_VISIBLE_DEVICES=0 ../iree-build-trace/tools/iree-run-module --device=hip://0 --hip_use_streams=true --hip_allow_inline_execution=true --device_allocator=caching --module=./Llama-3.1-70B-q4_1.vmfb --parameters=model=../Llama-3.1-70B-q4_1.irpa --function=prefill_bs4 --input=4x16xsi64 --input=4xsi64 --input=4x1xsi64 --input=128x2621440xf16
Steps to reproduce your issue
../iree-build-trace/tools/iree-compile --iree-hal-target-backends=rocm --iree-hip-target=gfx942 --iree-hal-executable-debug-level=3 ../sharktank/Llama-3.1-70B-q4_1.mlir -o Llama-3.1-70B-q4_1.vmfb
ROCR_VISIBLE_DEVICES=0 ../iree-build-trace/tools/iree-run-module --device=hip://0 --hip_use_streams=true --hip_allow_inline_execution=true --device_allocator=caching --module=./Llama-3.1-70B-q4_1-trace.vmfb --parameters=model=../Llama-3.1-70B-q4_1.irpa --function=prefill_bs4 --input=4x16xsi64 --input=4xsi64 --input=4x1xsi64 --input=128x2621440xf16
What component(s) does this issue relate to?
Runtime
Version information
0242f6dfdab168e6a661162b739c08488958ec77
Additional context
No response