Can't build GPT-J 6B - Githubissues

System Info

CPU architecture: x86_64
Host memory: 256GB
GPU
- Name: NVIDIA A30
- Memory: 24GB
Libraries
- TensorRT-LLM: v0.11.0
- TensorRT: 10.1.0
- CUDA: 12.6
- NVIDIA driver: 560.28.03
- Linux: Ubuntu 22.04

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Checkout v0.11.0 tag
Install Python requirements
Build GPT-J 6B engine, following the example

Expected behavior

A successful build

actual behavior

ubuntu$ python examples/gptj/convert_checkpoint.py --model_dir=gpt-j-6b --output_dir=gpt-j-6b/trt
[TensorRT-LLM] TensorRT-LLM version: 0.11.0
0.11.0
Weights loaded. Total time: 00:00:12
Traceback (most recent call last):
  File "/h/pcoppock/data/mlos/apps/triton/../../third-party/tensorrtllm_backend/tensorrt_llm/examples/gptj/convert_checkpoint.py", line 382, in <module>
    main()
  File "/h/pcoppock/data/mlos/apps/triton/../../third-party/tensorrtllm_backend/tensorrt_llm/examples/gptj/convert_checkpoint.py", line 358, in main
    covert_and_save(rank)
  File "/h/pcoppock/data/mlos/apps/triton/../../third-party/tensorrtllm_backend/tensorrt_llm/examples/gptj/convert_checkpoint.py", line 353, in covert_and_save
    safetensors.torch.save_file(
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 496, in _flatten
    return {
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 500, in <dictcomp>
    "data": _tobytes(v, k),
  File "/data/pcoppock/mlos/.venv/lib/python3.10/site-packages/safetensors/torch.py", line 414, in _tobytes
    raise ValueError(
ValueError: You are trying to save a non contiguous tensor: `lm_head.weight` which is not allowed. It either means you are trying to save tensors which are reference of each other in which case it's recommended to save only the full tensors, and reslice at load time, or simply call `.contiguous()` on your tensor to pack it before saving.
(.venv) ubuntu$

Checkpoint conversion fails with error "You are trying to save a noncontiguous tensor...."

additional notes

Conversion of Llama weights succeeds without error.

triton-inference-server / tensorrtllm_backend

Can't build GPT-J 6B #595

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes