chunit-quic commented 2 days ago

Enable bert mode
Change input sequence of static_llama
Tag bert output as uint8
Unify both 1b and 3b in 1 runner
Add hybrid IO memory for llama3_2 runner
Align timer with llama

pytorch-bot[bot] commented 2 days ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6983

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

:x: 3 New Failures

As of commit fcc10de4065dd2cd6c0d498d172597cd988fd677 with merge base 2d51e63d90746381fa5007246071fcac36aa8982 ():

NEW FAILURES - The following jobs have failed:

* [Check Labels / Check labels](https://hud.pytorch.org/pr/pytorch/executorch/6983#33362267187) ([gh](https://github.com/pytorch/executorch/actions/runs/11966517275/job/33362267187)) * [Lint / lintrunner / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6983#33362267862) ([gh](https://github.com/pytorch/executorch/actions/runs/11966517466/job/33362267862)) `>>> Lint for install_requirements.py:` * [pull / test-llava-runner-linux / linux-job](https://hud.pytorch.org/pr/pytorch/executorch/6983#33362270757) ([gh](https://github.com/pytorch/executorch/actions/runs/11966517484/job/33362270757)) `test_prefill_logits`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions[bot] commented 2 days ago

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example @pytorchbot label "topic: not user facing"

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cccclai commented 1 day ago

Hey do you mind sharing the command for AoT and runtime so I can try on my end?

chunit-quic commented 7 hours ago

Hey do you mind sharing the command for AoT and runtime so I can try on my end?

Sure! Change different mode (kv or bert )by set up the argument model_mode.

python examples/qualcomm/oss_scripts/llama3_2/llama.py -a ${ARCHIVE}/ -b build-android -H ${HOST} -s ${DEVICE}-m ${SOC} --checkpoint Llama3.2-1B-Instruct/consolidated.00.pth --params Llama3.2-1B-Instruct/params.json --tokenizer_model Llama3.2-1B-Instruct/tokenizer.model --prompt "<|start_header_id|>" --ptq 16a4w --temperature 0 --model_size 1B --seq_len 16  --model_mode bert

cccclai commented 7 hours ago

Hey do you mind sharing the command for AoT and runtime so I can try on my end?

Sure! Change different mode (kv or bert )by set up the argument model_mode.

python examples/qualcomm/oss_scripts/llama3_2/llama.py -a ${ARCHIVE}/ -b build-android -H ${HOST} -s ${DEVICE}-m ${SOC} --checkpoint Llama3.2-1B-Instruct/consolidated.00.pth --params Llama3.2-1B-Instruct/params.json --tokenizer_model Llama3.2-1B-Instruct/tokenizer.model --prompt "<|start_header_id|>" --ptq 16a4w --temperature 0 --model_size 1B --seq_len 16  --model_mode bert

Ah I see - do you mind rename bert mode to batch_prefill? The context is that bert isn't a common name..

chunit-quic commented 6 hours ago

Ah I see - do you mind rename bert mode to batch_prefill? The context is that bert isn't a common name..

No problem. let me change it

facebook-github-bot commented 6 hours ago

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai commented 4 hours ago

There are some errors here

executorch/examples/qualcomm/oss_scripts/llama3_2/runner/runner.cpp:50:7: error: field 'eval_mode_' will be initialized after field 'stats_' [-Werror,-Wreorder-ctor]
   50 |       eval_mode_(eval_mode),
      |       ^~~~~~~~~~~~~~~~~~~~~
      |       stats_({})
   51 |       stats_({}) {
      |       ~~~~~~~~~~
      |       eval_mode_(eval_mode)

chunit-quic commented 4 hours ago

executorch/examples/qualcomm/oss_scripts/llama3_2/runner/runner.cpp:50:7: error: field 'evalmode' will be initialized after field 'stats_' [-Werror,-Wreorder-ctor] 50 | evalmode(evalmode), | ^~~~~ | stats({}) 51 | stats_({}) { | ~~ | evalmode(eval_mode)

Thanks for pointing out. Fixed.

facebook-github-bot commented 3 hours ago

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorch / executorch

Qualcomm AI Engine Direct - Suport batch prefill mode for llama3.2 #6983

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6983

:heavy_exclamation_mark: 1 Active SEVs

:x: 3 New Failures

This PR needs a `release notes:` label

pytorch / executorch

Qualcomm AI Engine Direct - Suport batch prefill mode for llama3.2 #6983

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6983

:heavy_exclamation_mark: 1 Active SEVs

:x: 3 New Failures

This PR needs a release notes: label

This PR needs a `release notes:` label