Open amqdn opened 1 week ago
This behavior (outputting only Killed
) may indicate that an OOM error has occurred.
FYI, it requires ~30G RAM for exporting llama2-7B with flags of -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32
on my side.
@larryliu0820 https://github.com/pytorch/executorch/pull/3803/commits/2badd7643e248335cb26c310fa96472761c73d65 I guess this is your PR. Any chance you know what's going on here?
@cccclai can you take a look? Seems the PR causes OOM
Hello!
Commit
2badd76
appears to breakexamples.models.llama2.export_llama
, specifically with Llama 3.Expected Behavior
Current Behavior
Steps to Reproduce
git submodule sync git submodule update --init
./install_requirements.sh --pybind xnnpack
./examples/models/llama2/install_requirements.sh
python -m examples.models.llama2.export_llama --checkpoint -p -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3.pte"
git checkout fbbba34
Repeat export command
git checkout 2badd76
Repeat export command