tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
469 stars 73 forks source link

Llama3.1 prefill hang #13017

Closed cglagovichTT closed 1 week ago

cglagovichTT commented 1 month ago

@tstescoTT reports that running continuous batching in a loop leads to hangs during prefill, usually on seqlen=256 and 512.

tstescoTT commented 1 month ago

Thanks for reporting the issue here, I'll add some logs:

2024-09-20:12:00:28,778 INFO     [lm_backend.py:455] Decoding batch with indices [237, 1541, 574, 691, 406, 733, 387, 767, 231, 596, 767, 618, 461, 461, 626, 758, 443, 212, 793, 1559, 666, 290, 375, 1045, 487, 619, 503, 669, 496, 694, 820, 257]
2024-09-20:12:00:28,897 INFO     [lm_backend.py:415] Filling kv cache for user_id:= 10, prefill_ids.shape:=torch.Size([1, 256]), seq_len=151
2024-09-20:12:00:29,333 INFO     [lm_backend.py:434] completed prefill user_id:= 10, next_token:=1271
2024-09-20:12:00:29,333 INFO     [lm_backend.py:455] Decoding batch with indices [238, 1542, 575, 692, 407, 734, 388, 768, 232, 597, 151, 619, 462, 462, 627, 759, 444, 213, 794, 1560, 667, 291, 376, 1046, 488, 620, 504, 670, 497, 695, 821, 258]
...
2024-09-20:12:00:30,390 INFO     [lm_backend.py:455] Decoding batch with indices [247, 1551, 584, 701, 416, 743, 397, 777, 241, 606, 160, 628, 471, 471, 636, 768, 453, 222, 803, 1569, 676, 300, 385, 1055, 497, 629, 513, 679, 506, 704, 830, 267]
2024-09-20:12:00:30,509 INFO     [lm_backend.py:415] Filling kv cache for user_id:= 18, prefill_ids.shape:=torch.Size([1, 256]), seq_len=241

... hang for 10+ minutes

It happens almost always on 256, not always on the same prompt, and typically after 1000+ prompts are completed.

cglagovichTT commented 1 week ago

Resolved this with PR https://github.com/tenstorrent/tt-metal/pull/13348