pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.41k stars 230 forks source link

2badd76 breaks examples.models.llama2.export_llama #3983

Open amqdn opened 1 week ago

amqdn commented 1 week ago

Hello!

Commit 2badd76 appears to break examples.models.llama2.export_llama, specifically with Llama 3.

Expected Behavior

[INFO 2024-06-14 16:04:23,366 export_llama_lib.py:390] Applying quantizers: []
[INFO 2024-06-14 16:04:23,366 builder.py:91] Loading model with checkpoint=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-06-14 16:04:25,619 builder.py:112] Loaded model with dtype=torch.bfloat16
[INFO 2024-06-14 16:04:25,920 config.py:58] PyTorch version 2.4.0.dev20240507+cpu available.
linear: layers.0.attention.wq, in=4096, out=4096
linear: layers.0.attention.wk, in=4096, out=1024
linear: layers.0.attention.wv, in=4096, out=1024
linear: layers.0.attention.wo, in=4096, out=4096
linear: layers.0.feed_forward.w1, in=4096, out=14336
linear: layers.0.feed_forward.w2, in=14336, out=4096
linear: layers.0.feed_forward.w3, in=4096, out=14336

...

modelname: llama3
output_file: llama3.pte
[INFO 2024-06-14 16:10:11,610 utils.py:114] Saved exported program to llama3.pte

Current Behavior

[INFO 2024-06-14 16:15:24,437 export_llama_lib.py:390] Applying quantizers: []
[INFO 2024-06-14 16:15:24,437 builder.py:91] Loading model with checkpoint=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=/home/user/.cache/meta/Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
[INFO 2024-06-14 16:15:24,834 builder.py:112] Loaded model with dtype=torch.bfloat16
[INFO 2024-06-14 16:15:24,834 builder.py:197] model.to torch.float32
Killed

Steps to Reproduce

  1. Install ExecuTorch w/ XNNPACK from scratch:
    
    git clone --branch main https://github.com/pytorch/executorch.git
    cd executorch

git submodule sync git submodule update --init

./install_requirements.sh --pybind xnnpack

2. Install `examples.models.llama2` dependencies:

./examples/models/llama2/install_requirements.sh

3. (Optional) Download Meta Llama3, if necessary
4. Verify export fails @ `main`:

python -m examples.models.llama2.export_llama --checkpoint -p -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3.pte"

5. Verify export succeeds @ parent `fbbba34`:

git checkout fbbba34

Repeat export command

6. Verify export fails @ `2badd76`:

git checkout 2badd76

Repeat export command



### Request

Please examine `2badd76` and verify the problem. If confirmed, please fix. Thanks!
AgainstEntropy commented 1 week ago

This behavior (outputting only Killed) may indicate that an OOM error has occurred. FYI, it requires ~30G RAM for exporting llama2-7B with flags of -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 on my side.

cbilgin commented 2 days ago

@larryliu0820 https://github.com/pytorch/executorch/pull/3803/commits/2badd7643e248335cb26c310fa96472761c73d65 I guess this is your PR. Any chance you know what's going on here?

larryliu0820 commented 4 hours ago

@cccclai can you take a look? Seems the PR causes OOM