Open rahuan opened 1 year ago
BTW, my model is BERT, any hints?
@rahuan can you try to run it with the latest triton servers (rebuild the image if you are not using the latest one) ? and enable the verbose logging by --log-verbose 1
?
Also, try without k8s first to see if it is reproducible.
@rahuan can you try to run it with the latest triton servers (rebuild the image if you are not using the latest one) ? and enable the verbose logging by
--log-verbose 1
?Also, try without k8s first to see if it is reproducible.
I built the image using the latest code of fastertransformer_backend at Nov. 24th. Also I used --log-verbose=2. Let's try to sync latest code now, and rebuild.
I just synced latest code of fastertransformer_backend now, it fails even faster at a very low qps. below are errors: I1212 06:38:03.990948 1 libfastertransformer.cc:1022] get total batch_size = 1 I1212 06:38:03.990967 1 libfastertransformer.cc:1433] get input count = 2 I1212 06:38:03.991087 1 libfastertransformer.cc:1672] collect name: input_hidden_state size: 353280 bytes I1212 06:38:03.991104 1 libfastertransformer.cc:1672] collect name: sequence_lengths size: 40 bytes I1212 06:38:03.991113 1 libfastertransformer.cc:1683] the data is in CPU I1212 06:38:03.991121 1 libfastertransformer.cc:1690] the data is in CPU I1212 06:38:03.991145 1 libfastertransformer.cc:1380] before ThreadForward 0 I1212 06:38:03.991202 1 libfastertransformer.cc:1388] after ThreadForward 0 I1212 06:38:03.991222 1 libfastertransformer.cc:1226] Start to forward terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] Assertion fail: /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/layers/attention_layers/FusedAttentionLayer.cu:178
Signal (6) received.
0# 0x000055C30FAAFC19 in tritonserver
1# 0x00007FB022A0E090 in /usr/lib/x86_64-linux-gnu/libc.so.6
2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
4# 0x00007FB022DC7911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
5# 0x00007FB022DD338C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007FB022DD33F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
7# 0x00007FB022DD36A9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
8# fastertransformer::myAssert(bool, char const, int, std::cxx11::basic_string<char, std::char_traits
what seq length you are using ?
what seq length you are using ?
Batch_size is 10 or 20, seq length is different for each sentence in a batch, average is about 50~60, but the max seq length is limited to 128
thanks. Can you also share the head-size you are using? that will be helpful for us to reproduce.
The model settings are the same as bert-base chinese, layer num is 12, head num is 12, hidden size is 768 = 64*12, thanks! BTW, the data_type is fp16, is_remove_padding is set to 1
@PerkzZheng, may I ask if any findings about this issue?
@rahuan sorry for the late response.
You can print out the before this line here. You might have changed how s
is given, and we can not find the corresponding kernel.
Description
Reproduced Steps