Closed ogis-uno closed 1 year ago
model_checkpoint_path
is used to load the model. You cannot remove it.
Thanks for reply!
model_checkpoint_path
is used to load the model. You cannot remove it.
Hmm, I checked libfastertransformer.cc.
If I don't set both default_model_filename
and model_checkpoint_path
,
model_dir
will be ${repository_path}/${version}/${tensor_para_size}-gpu
and triton tries to load from model_dir
.
I think cause of warning is bellow, adding "/" to the head of "model.encoder.layer." seems to correct the problem.
In T5, where loading model parameter is like bellow. and I can load T5 model without model_checkpoint_path
.
model_dir
is get by model_checkpoint_path
std::string model_dir =
param_get("model_checkpoint_path") == ""
? JoinPath(
{RepositoryPath(), std::to_string(Version()), model_filename})
: param_get("model_checkpoint_path");
model_dir
is get bymodel_checkpoint_path
So you mean, if I want model_dir
from JoinPath( {RepositoryPath(), std::to_string(Version()), model_filename})
I should set model_checkpoint_path
like bellow?
parameters {
key: "model_checkpoint_path"
value: {
string_value: ""
}
}
You should set it to the checkpoint you put like
parameters {
key: "model_checkpoint_path"
value: {
string_value: "../all_models/bert/fastertransformer/1/2-gpu/"
}
}
Thank you for reply.
So, You mean I MUST set model_checkpoint_path
to the directory which includes config.ini
and model.encoder.layer*.bin
as a required parameter?
If so, one more question. (Sorry for bothering you.) what is the ternary operator on line 264 is for ? Something wrong will happen if the line 265 is executed?
config.ini
is necessary (it is used to setup the model hyper-parameters), but model.encoder.layer*.bin
are not (They are weights. If the program does not find them, it will generate random weights automatically).
In line 264, if you don't set model_checkpoint_path
, it will try to load model from a default path JoinPath({RepositoryPath(), std::to_string(Version()), model_filename})
.
Hi, Thank you for answer. and sorry for a bit long reply.
config.ini is necessary (it is used to setup the model hyper-parameters), but model.encoder.layer*.bin are not (They are weights. If the program does not find them, it will generate random weights automatically).
I think that's what happened in my case. and randomly generated weights are useless for inference.
In line 264, if you don't set model_checkpoint_path, it will try to load model from a default path JoinPath({RepositoryPath(), std::to_string(Version()), model_filename}).
Yes, it's what I want to do. My intention is like bellow.
In this configuration, I can't write model_checkpoint_path
in config.pbtxt.
config.pbtxt
...
version_policy: { specific { versions : [1, 2] }}
...
model dir
model_dir
+ config.pbtxt
+ 1
+ 1-gpu
+ config.ini
+ model.encoder.layer.*.bin
+ 2
+ 1-gpu
+ config.ini
+ model.encoder.layer.*.bin
Current my directory structure is bellow. I think config.ini and it' weight exists in correct place.
And I run 3 test with / without model_checkpoint_path
again.
# model-repository would be ...
root@7d6490ff95ca:/home/uno/fastertransformer_backend# echo ${WORKSPACE}/all_models/bert/
/home/uno/fastertransformer_backend/all_models/bert/
# config.pbtxt exits in fastertransformer under model-repository.
root@7d6490ff95ca:/home/uno/fastertransformer_backend# ls all_models/bert/fastertransformer/
1 config.pbtxt
# Version 1 of fastertransformer has it's content
root@7d6490ff95ca:/home/uno/fastertransformer_backend# ls all_models/bert/fastertransformer/1
1-gpu
# Contents of version 1 of fastertransformer.
# It has config.ini and model parameters(model.encoder.layer.*.bin)
root@7d6490ff95ca:/home/uno/fastertransformer_backend# ls all_models/bert/fastertransformer/1/1-gpu/ | head -4
config.ini
model.encoder.layer.0.attention.output.LayerNorm.bias.bin
model.encoder.layer.0.attention.output.LayerNorm.weight.bin
model.encoder.layer.0.attention.output.dense.bias.bin
Case 0. Start up triton with model_checkpoint_path
. as you suggests.
Yes, Triton started up without warnings. it can find config.ini and model weights.
root@7d6490ff95ca:/home/uno/fastertransformer_backend# tail -6 all_models/bert/fastertransformer/config.pbtxt
parameters {
key: "model_checkpoint_path"
value: {
string_value: "/home/uno/fastertransformer_backend/all_models/bert/fastertransformer/1/1-gpu/"
}
}
root@7d6490ff95ca:/home/uno/fastertransformer_backend# CUDA_VISIBLE_DEVICES=0,1 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/bert/
I1027 00:21:22.152144 169 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f733a000000' with size 268435456
I1027 00:21:22.154672 169 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1027 00:21:22.161613 169 model_repository_manager.cc:1206] loading: fastertransformer:1
I1027 00:21:22.234861 169 libfastertransformer.cc:1478] TRITONBACKEND_Initialize: fastertransformer
I1027 00:21:22.234888 169 libfastertransformer.cc:1488] Triton TRITONBACKEND API version: 1.10
I1027 00:21:22.234900 169 libfastertransformer.cc:1494] 'fastertransformer' TRITONBACKEND API version: 1.10
I1027 00:21:22.234940 169 libfastertransformer.cc:1526] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I1027 00:21:22.237616 169 libfastertransformer.cc:218] Instance group type: KIND_CPU count: 1
I1027 00:21:22.237650 169 libfastertransformer.cc:248] Sequence Batching: disabled
I1027 00:21:22.237748 169 libfastertransformer.cc:420] Before Loading Weights:
after allocation : free: 14.43 GB, total: 14.62 GB, used: 0.20 GB
I1027 00:21:24.061989 169 libfastertransformer.cc:430] After Loading Weights:
after allocation : free: 14.19 GB, total: 14.62 GB, used: 0.43 GB
...
Case 1. Start up triton with model_checkpoint_path
but without trailing slash. ("1-gpu" not "1-gpu/")
I got warnings. Triton can find config.ini but can't model weights.
root@7d6490ff95ca:/home/uno/fastertransformer_backend# tail -6 all_models/bert/fastertransformer/config.pbtxt
parameters {
key: "model_checkpoint_path"
value: {
string_value: "/home/uno/fastertransformer_backend/all_models/bert/fastertransformer/1/1-gpu"
}
}
root@7d6490ff95ca:/home/uno/fastertransformer_backend# CUDA_VISIBLE_DEVICES=0,1 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/bert/
I1027 00:24:54.048687 213 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fb072000000' with size 268435456
I1027 00:24:54.051236 213 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1027 00:24:54.058875 213 model_repository_manager.cc:1206] loading: fastertransformer:1
I1027 00:24:54.130787 213 libfastertransformer.cc:1478] TRITONBACKEND_Initialize: fastertransformer
I1027 00:24:54.130815 213 libfastertransformer.cc:1488] Triton TRITONBACKEND API version: 1.10
I1027 00:24:54.130829 213 libfastertransformer.cc:1494] 'fastertransformer' TRITONBACKEND API version: 1.10
I1027 00:24:54.130865 213 libfastertransformer.cc:1526] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I1027 00:24:54.132790 213 libfastertransformer.cc:218] Instance group type: KIND_CPU count: 1
I1027 00:24:54.132813 213 libfastertransformer.cc:248] Sequence Batching: disabled
I1027 00:24:54.132906 213 libfastertransformer.cc:420] Before Loading Weights:
after allocation : free: 14.43 GB, total: 14.62 GB, used: 0.20 GB
[FT][WARNING] file /home/uno/fastertransformer_backend/all_models/bert/fastertransformer/1/1-gpumodel.encoder.layer.0.output.LayerNorm.weight.bin cannot be opened, loading model fails!
[FT][WARNING] file /home/uno/fastertransformer_backend/all_models/bert/fastertransformer/1/1-gpumodel.encoder.layer.0.attention.self.query.weight.0.bin cannot be opened, loading model fails!
...
Case 2. Start up triton without model_checkpoint_path
. (what I want to do)
Save result as case 1, I got warnings. Triton can find config.ini but can't model weights.
root@7d6490ff95ca:/home/uno/fastertransformer_backend# grep -e "model_checkpoint_path" all_models/bert/fastertransformer/config.pbtxt
root@7d6490ff95ca:/home/uno/fastertransformer_backend#
root@7d6490ff95ca:/home/uno/fastertransformer_backend# CUDA_VISIBLE_DEVICES=0,1 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/bert/
I1027 00:13:01.509585 118 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f40aa000000' with size 268435456
I1027 00:13:01.513728 118 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1027 00:13:01.542854 118 model_repository_manager.cc:1206] loading: fastertransformer:1
I1027 00:13:02.022773 118 libfastertransformer.cc:1478] TRITONBACKEND_Initialize: fastertransformer
I1027 00:13:02.022823 118 libfastertransformer.cc:1488] Triton TRITONBACKEND API version: 1.10
I1027 00:13:02.022833 118 libfastertransformer.cc:1494] 'fastertransformer' TRITONBACKEND API version: 1.10
I1027 00:13:02.022949 118 libfastertransformer.cc:1526] TRITONBACKEND_ModelInitialize: fastertransformer (version 1)
I1027 00:13:02.028285 118 libfastertransformer.cc:218] Instance group type: KIND_CPU count: 1
I1027 00:13:02.028325 118 libfastertransformer.cc:248] Sequence Batching: disabled
I1027 00:13:02.031406 118 libfastertransformer.cc:420] Before Loading Weights:
after allocation : free: 14.43 GB, total: 14.62 GB, used: 0.20 GB
[FT][WARNING] file /home/uno/fastertransformer_backend/all_models/bert/fastertransformer/1/1-gpumodel.encoder.layer.0.output.LayerNorm.weight.bin cannot be opened, loading model fails!
[FT][WARNING] file /home/uno/fastertransformer_backend/all_models/bert/fastertransformer/1/1-gpumodel.encoder.layer.0.attention.self.query.weight.0.bin cannot be opened, loading model fails!
...
I think the cause of warnings is
getModelFileType()
got .../fastertransformer/1/1-gpu//config.ini
in Case 0, and it seems can hanlde "//" without problems. In Case 1 and 2, it got `.../fastertransformer/1/1-gpu/config.ini" and that's quite normal.bert_layer_weights[l].loadModel()
got .../fastertransformer/1/1-gpu/model.encoder.layer....
in Case 0, and that's quite normal. But in Case 1 and 2, it got .../fastertransformer/1/1-gpumodel.encoder.layer....
as warnings said. This is very problematic.Got it. I have updated the FT codes. Can you rebuild the docker again?
Got it. I have updated the FT codes. Can you rebuild the docker again?
I have rebuild docker and re-run test again. All Case 0, 1, 2 works fine without "loading model fails!" warnings.
Thank you for your help!
Description
I tried to deploy multiple versions of BERT. For that, I removed "default_model_filename" and "model_checkpoint_path" from config.pbtxt. But when started up triton, I got many warning messages like follows.
Environments.
Reproduced Steps