Closed PiotrNawrot closed 2 months ago
Hi @PiotrNawrot, if you want to test PasskeyRetrieval
proposed by landmark attention, I think the task name should be niah_single_1
. In your setting, niah_multikey_3
is a very hard task since we use UUID as the needle key-value and we have multiple distracted needles in the context.
Oh right, sorry, I think I got confused by the naming of all these tasks and was assured that passkey_retrieval = kv_retrieval
.
Could you please confirm that you used the same model template for testing llama 3.1
?
Yes, your template is the same as mine to evaluate Llama 3.1 series
.
Thanks!
Hey,
Thanks for your amazing work - it's really really amazing.
I wanted to reproduce the results of meta-llama/Meta-Llama-3.1-8B-Instruct, reported in Table 10 in your paper. I believe it should be perfect accuracy for PasskeyRetrieval at 128k, but I get 62.5.
I did the following:
And I got
Could you please help me what I should do here to get the reported perfect accuracy?
Many thanks!