triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
692 stars 103 forks source link

Can't build GPT-J 6B #595

Open coppock opened 1 month ago

coppock commented 1 month ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

  1. Checkout v0.11.0 tag
  2. Install Python requirements
  3. Build GPT-J 6B engine, following the example

Expected behavior

A successful build

actual behavior

ubuntu$ python examples/gptj/convert_checkpoint.py --model_dir=gpt-j-6b --output_dir=gpt-j-6b/trt
[TensorRT-LLM] TensorRT-LLM version: 0.11.0
0.11.0
Weights loaded. Total time: 00:00:12
Traceback (most recent call last):
  File "/h/pcoppock/data/mlos/apps/triton/../../third-party/tensorrtllm_backend/tensorrt_llm/examples/gptj/convert_checkpoint.py", line 382, in <module>
    main()
  File "/h/pcoppock/data/mlos/apps/triton/../../third-party/tensorrtllm_backend/tensorrt_llm/examples/gptj/convert_checkpoint.py", line 358, in main
    covert_and_save(rank)
  File "/h/pcoppock/data/mlos/apps/triton/../../third-party/tensorrtllm_backend/tensorrt_llm/examples/gptj/convert_checkpoint.py", line 353, in covert_and_save
    safetensors.torch.save_file(
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 496, in _flatten
    return {
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 500, in <dictcomp>
    "data": _tobytes(v, k),
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 414, in _tobytes
    raise ValueError(
ValueError: You are trying to save a non contiguous tensor: `lm_head.weight` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.
(.venv) ubuntu$

Checkpoint conversion fails with error "You are trying to save a noncontiguous tensor...."

additional notes

Conversion of Llama weights succeeds without error.

coppock commented 1 month ago

The failing tensor is lm_head.weight. The following patch fixes this issue:

diff --git a/examples/gptj/convert_checkpoint.py b/examples/gptj/convert_checkpoint.py
index 8c062bc4..f00f8f33 100644
--- a/examples/gptj/convert_checkpoint.py
+++ b/examples/gptj/convert_checkpoint.py
@@ -249,7 +249,7 @@ def convert_hf_gptj(hf_model: GPTJForCausalLM,
         weights['lm_head.weight'] = split_matrix(lm_head_w,
                                                  mapping.tp_size,
                                                  mapping.tp_rank,
-                                                 dim=0)
+                                                 dim=0).contiguous()
         weights['lm_head.bias'] = split_matrix(ln_head_bias,
                                                mapping.tp_size,
                                                mapping.tp_rank,