Convert nemo-megatron-mt5-3B to binary files of fastertransformer successfully, but tritonserver fails when loading models with unmatched bias.bin.

songkq commented 1 year ago

@Thytu @rr0gi Hi, could you please give some advice for this issue?

nemo_megatron_mt5_3b_bf16_tp2.nemo (https://huggingface.co/nvidia/nemo-megatron-mt5-3B) model was trained with --tensor_model_parallel_size=2.

I have converted the nemo-megatron-mt5-3B to binary files successfully by python3 FasterTransformer/examples/pytorch/t5/utils/nemo_t5_ckpt_convert.py -i nemo-megatron-mt5-3B/nemo_megatron_mt5_3b_bf16_tp2.nemo -o ./models/nemo-megatron-mt5-3B/ -m mt5-3B -i_g 2

When run a tritonserver with CUDA_VISIBLE_DEVICES="0,1" /opt/tritonserver/bin/tritonserver --model-store=fastertransformer_backend/all_models/nemo-megatron-mt5-3B/, tritonserver failed to loading the model with unmatched shape.

I0414 15:43:13.619001 934 libfastertransformer.cc:438] Before Loading Weights:
after allocation    : free: 14.14 GB, total: 44.56 GB, used: 30.43 GB
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!

[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!

[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!

[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!

I0414 15:43:21.362566 934 libfastertransformer.cc:448] After Loading Weights:

Here are my config.pbtxt and config.ini. config.zip

byshiue commented 1 year ago

@songkq Thank you. The reason is because we don't set the loading type for these two bias. Fixing in this commit https://github.com/NVIDIA/FasterTransformer/commit/0c128050e14b65d72b3c28c0324cc9db6f677be8.

songkq commented 1 year ago

Solve the problem with https://github.com/NVIDIA/FasterTransformer/issues/561.

triton-inference-server / fastertransformer_backend

Convert nemo-megatron-mt5-3B to binary files of fastertransformer successfully, but tritonserver fails when loading models with unmatched bias.bin. #123