Closed mtairum closed 1 week ago
This seems to be happening to Mixtral as well. Tested on branch mixtral-32k-demo
We see ND PCC in our t3k Llama attention tests. We isolated it to this commit https://github.com/tenstorrent/tt-metal/commit/7b8e627c078e8262a22df07627b00b2f1d645abb#diff-dcb2b8d4bed26e70b5f09bd34fcce0e81d7a8770a96ecf79ac753e4e27559af2
Probably the same: https://github.com/tenstorrent/tt-metal/issues/11438
For llama3.1-8B on branch aho/unpacker-delay
I'm not seeing variability anymore.
The test_model (length = 512) is showing PCC = 0.9632 (we had a cutoff a 0.94), so this was an improvement as well.
For lengtht of 4k the PCC is still 0.8667
(the same as https://github.com/tenstorrent/tt-metal/issues/11438), so no change there.
Correct me if I am wrong, but both the commit that caused the non-determinism and the fix are not expected to change the math, so we should expect to get the same PCC as before (unless some other change in between changed the math). The fact that pcc is not the same, even if it is 0.96, might point to a problem.
@uaydonat My bad on the previous comment. I should've added a 'probably' or double checked with older pipelines.
Although our PCC cutoff is 0.94, before this variability issue was introduced the PCC was already 0.9632
( checked runs from 1 week and 2 weeks ago), so there wasn't any change there, as expected.
Is the fix in main? Should we close this?
Yes this is in main. Closed.
Describe the bug
Running Llama3.1-8B demo with prefill results in different outputs. All good outputs, but always different.
For now we've disabled the output token validation from main to avoid blocking CI.
To Reproduce
Tested on latest main
ce56b42712429416485b377d30f88500bf243dfa
.Also tried the fixture (reset_seeds) but no luck. Double checked and we are doing argmax, so not sure where the variability is coming from.
I tried with a more robust prompt
Can you describe and comment the following number sequence? 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163
, but as can be seen below, it still outputs a slightly different answer every time.Bad output
I've run multiple times and got different variations of the output. Examples below.