[Executorch] Add quantized kv cache to oss ci

kimishpatel commented 1 day ago

Stack from ghstack (oldest at bottom):

-> #6997
6996
6914
5715
5670

Fixes to make sure quantized kv cache works in oss

Differential Revision: D66269487

pytorch-bot[bot] commented 1 day ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6997

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

:x: 6 New Failures, 1 Pending

As of commit a943088fb54766fb88e3f4393818b2ffb2418b0f with merge base 43555d21c289669ecacca5cd46d9a9018c6d7f7c ():

NEW FAILURES - The following jobs have failed:

* [Lint / lintrunner / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6997#33356078308) ([gh](https://github.com/pytorch/executorch/actions/runs/11964189426/job/33356078308)) `>>> Lint for examples/models/llama/source_transformation/quantized_kv_cache.py:` * [pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv) / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6997#33356085188) ([gh](https://github.com/pytorch/executorch/actions/runs/11964189446/job/33356085188)) `RuntimeError: Command docker exec -t e5abf7c4b22ff7d64dd4fd13cecaa77e88730b2985f17861cb19717c755bc923 /exec failed with exit code 1` * [pull / test-llama-runner-linux (fp32, xnnpack+quantize_kv) / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6997#33356085598) ([gh](https://github.com/pytorch/executorch/actions/runs/11964189446/job/33356085598)) `RuntimeError: Command docker exec -t 4a19cfc2f04099c6d720b1f5ec211191ec951e793524062e520ed241102cea3b /exec failed with exit code 1` * [pull / test-llama-runner-qnn-linux (fp32, qnn) / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6997#33356087972) ([gh](https://github.com/pytorch/executorch/actions/runs/11964189446/job/33356087972)) `AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'` * [pull / test-llava-runner-linux / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6997#33356086086) ([gh](https://github.com/pytorch/executorch/actions/runs/11964189446/job/33356086086)) `test_llava_export` * [trunk / test-llama-runner-mac (fp32, xnnpack+custom+quantize_kv) / macos-job](https://hud.pytorch.org/pr/pytorch/executorch/6997#33356085891) ([gh](https://github.com/pytorch/executorch/actions/runs/11964189404/job/33356085891)) `RuntimeError: Missing out variants: {'quantized_decomposed::dequantize_per_token', 'quantized_decomposed::choose_qparams_per_token_asymmetric', 'quantized_decomposed::quantize_per_token'}`

This comment was automatically generated by Dr. CI and updates every 15 minutes.