huggingface / optimum-nvidia

Apache License 2.0
867 stars 86 forks source link

RuntimeError: TRT Engine build failed... #58

Open yirunwang opened 8 months ago

yirunwang commented 8 months ago

Hi,

I ran the example with 4xA100 system and encountered the following issue. Please advise how can I resolve this issue? Thanks python examples/text-generation/llama.py --hub-token ${HF_TOKEN} meta-llama/Llama-2-7b-chat-hf ./output

Login successful [12/31/2023-12:31:34] Quantization descriptor was None, assuming no quantization will be applied. If you want to change this behaviour, please use TRTEngineBuilder.with_quantization_schema() [12/31/2023-12:31:34] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 152, GPU 70212 (MiB) [12/31/2023-12:31:39] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1973, GPU +350, now: CPU 2261, GPU 70562 (MiB) [12/31/2023-12:32:13] [TRT-LLM] [I] Context FMHA Enabled [12/31/2023-12:32:13] [TRT-LLM] [I] Remove Padding Enabled [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/vocab_embedding/GATHER_0_output_0 and LLaMAForCausalLM/layers/0/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/0/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/0/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/1/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/1/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/1/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/1/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/1/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/2/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/2/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/2/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:13] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:13] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/2/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/2/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/3/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/3/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/3/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/3/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/3/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/4/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/4/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/4/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/4/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/4/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/5/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/5/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/5/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/5/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/5/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/6/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/6/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/6/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:14] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:14] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/6/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/6/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/7/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/7/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/7/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/7/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/7/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/8/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/8/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/8/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/8/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/8/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/9/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/9/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/9/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:15] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:15] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/9/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/9/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/10/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/10/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/10/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/10/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/10/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/11/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/11/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/11/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/11/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/11/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/12/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/12/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/12/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/12/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/12/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/13/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/13/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/13/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:16] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:16] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/13/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/13/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/14/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/14/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/14/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/14/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/14/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/15/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/15/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/15/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/15/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/15/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/16/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/16/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/16/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:17] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:17] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/16/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/16/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/17/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/17/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/17/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/17/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/17/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/18/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/18/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/18/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/18/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/18/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/19/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/19/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/19/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/19/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/19/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/20/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/20/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/20/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:18] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/20/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/20/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/21/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/21/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/21/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/21/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/21/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/22/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/22/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/22/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/22/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/22/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/23/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/23/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/23/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:19] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/23/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/23/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/24/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/24/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/24/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/24/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/24/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/25/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/25/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/25/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/25/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/25/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/26/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/26/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/26/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:20] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:20] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/26/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/26/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/27/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/27/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/27/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/27/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/27/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/28/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/28/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/28/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/28/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/28/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/29/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/29/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/29/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/29/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/29/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/30/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/30/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/30/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:21] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:21] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/30/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:22] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:22] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/30/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/layers/31/input_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:22] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:22] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/31/input_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:22] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:22] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/layers/31/post_layernorm/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:22] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:22] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/31/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:22] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:22] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/31/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/ln_f/SHUFFLE_0_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:22] [TRT] [W] A shape layer can only run in INT32 precision [12/31/2023-12:32:22] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/ln_f/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/ln_f/SHUFFLE_1_output_0: first input has type Half but second input has type Float. [12/31/2023-12:32:27] [TRT-LLM] [I] Build TensorRT engine llama_float16_tp1_rank0.engine [12/31/2023-12:32:27] [TRT] [W] Unused Input: position_ids [12/31/2023-12:32:27] [TRT] [W] Detected layernorm nodes in FP16. [12/31/2023-12:32:27] [TRT] [W] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy. [12/31/2023-12:32:27] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed. [12/31/2023-12:32:27] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 15250, GPU 71162 (MiB) [12/31/2023-12:32:27] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 15252, GPU 71172 (MiB) [12/31/2023-12:32:27] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.4 [12/31/2023-12:32:27] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored. [12/31/2023-12:32:37] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called. [12/31/2023-12:32:37] [TRT] [I] Detected 73 inputs and 33 output network tensors. [12/31/2023-12:32:41] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::140] Error Code 2: OutOfMemory (no further information) [12/31/2023-12:32:41] [TRT] [E] 2: [virtualMemoryBuffer.cpp::resizePhysical::140] Error Code 2: OutOfMemory (no further information) [12/31/2023-12:32:41] [TRT] [W] Requested amount of GPU memory (10433331200 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [12/31/2023-12:32:42] [TRT] [E] 2: [12/31/2023-12:32:42] [TRT] [E] 2: [globWriter.cpp::makeResizableGpuMemory::423] Error Code 2: OutOfMemory (no further information) [12/31/2023-12:32:42] [TRT-LLM] [E] Engine building failed, please check the error log. [12/31/2023-12:32:42] [TRT] [I] Serialized 59 bytes of code generator cache. [12/31/2023-12:32:42] [TRT] [I] Serialized 156869 bytes of compilation cache. [12/31/2023-12:32:42] [TRT] [I] Serialized 136 timing cache entries [12/31/2023-12:32:42] [TRT-LLM] [I] Timing cache serialized to output/timings.cache [12/31/2023-12:32:42] [TRT-LLM] [I] Config saved to output/build.json. Traceback (most recent call last): File "/opt/optimum-nvidia/examples/text-generation/llama.py", line 101, in builder.build(args.output, args.optimization_level) File "/opt/optimum-nvidia/src/optimum/nvidia/builder.py", line 405, in build build_func(shards_info, files, output_path, optimization_level) File "/opt/optimum-nvidia/src/optimum/nvidia/builder.py", line 418, in _build_serial self._build_engine_for_rank(shard, weights, output_path, opt_level, is_parallel=False) File "/opt/optimum-nvidia/src/optimum/nvidia/builder.py", line 579, in _build_engine_for_rank raise RuntimeError("TRT Engine build failed... Please check the logs and open up an issue.") RuntimeError: TRT Engine build failed... Please check the logs and open up an issue.

RomanKoshkin commented 8 months ago

Same problem with the TRT Engine.

File "/opt/optimum-nvidia/src/optimum/nvidia/models/base.py", line 60, in _from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/hub_mixin.py", line 157, in from_pretrained return cls._from_pretrained( File "/opt/optimum-nvidia/src/optimum/nvidia/runtime.py", line 154, in _from_pretrained builder.build(engine_folder, optimization_level) File "/opt/optimum-nvidia/src/optimum/nvidia/builder.py", line 405, in build build_func(shards_info, files, output_path, optimization_level) File "/opt/optimum-nvidia/src/optimum/nvidia/builder.py", line 418, in _build_serial self._build_engine_for_rank(shard, weights, output_path, opt_level, is_parallel=False) File "/opt/optimum-nvidia/src/optimum/nvidia/builder.py", line 579, in _build_engine_for_rank raise RuntimeError("TRT Engine build failed... Please check the logs and open up an issue.") RuntimeError: TRT Engine build failed... Please check the logs and open up an issue.