Adds the following LLaMA 2 models to the transformers corpus:
7b
13b
34b
70b
The models will run one token in prefill mode and one token in token generation (kv cache) mode.
Note that individual users need to request permissions to use LLaMA2 weights from Meta and then supply a path to those weights with --script-args="--pretrained --model_path PATH/TO/MODEL". All other users can deploy llama2_*.py without any script args to get random weights for analysis and benchmarking purposes.
This PR also includes a fix to common.build.get_shapes_and_dtypes() to support double-nested tuples/lists.
Adds the following LLaMA 2 models to the
transformers
corpus:The models will run one token in prefill mode and one token in token generation (kv cache) mode.
Note that individual users need to request permissions to use LLaMA2 weights from Meta and then supply a path to those weights with
--script-args="--pretrained --model_path PATH/TO/MODEL"
. All other users can deployllama2_*.py
without any script args to get random weights for analysis and benchmarking purposes.This PR also includes a fix to
common.build.get_shapes_and_dtypes()
to support double-nested tuples/lists.