Closed dvorjackz closed 19 hours ago
Can you add the AOT steps how you generate the .pte? I believe that we are running the tinyllama continuously in our CI to avoid breakage. Maybe the issue is because you are testing with a AOT path that is not covered by the CI?
Thanks @guangy10, that's right. The readme command does not use kv cache, while CI does. Updating the readme to use kv cache for now, which works. https://github.com/pytorch/executorch/pull/5460
🐛 Describe the bug
When trying to run stories with
cmake-out/examples/models/llama2/llama_main --model_path=xnnpack_llama2.pte --tokenizer_path=tokenizer.bin --prompt=...
, I get the following errors:Versions