Open GokulNC opened 3 years ago
Looks like we didn't add support for the quantized models to the old RNN models? How are you calling the decoder?
@emjotde
I am calling it using marian-server
and the -c decoder.yml
config after replacing the model name with the converted model.bin
.
Also I just tried using marian-decoder
too (using intgemm8
model), it seems to be throwing a different error:
[2021-03-12 06:24:18] Error: Segmentation fault
[2021-03-12 06:24:18] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /home/gokul/marian/src/common/logging.cpp:130
[CALL STACK]
[0x5649bf9aff6e] + 0x40ff6e
[0x5649bf9b01d9] + 0x4101d9 [0x7fe2dd653730] + 0x12730
[0x5649bfbbed9f] marian::cpu:: Concatenate1 (IntrusivePtr<marian::TensorBase>, std::vector<IntrusivePtr<marian::TensorBase>,std::allocator<IntrusivePtr<marian::TensorBase>>> const&) + 0x60f
[0x5649bfbc29e3] marian::cpu:: Concatenate (IntrusivePtr<marian::TensorBase>, std::vector<IntrusivePtr<marian::TensorBase>,std::allocator<IntrusivePtr<marian::TensorBase>>> const&, int) + 0x73
[0x5649bfc376e5] marian::ConcatenateNodeOp:: forward () + 0x175
[0x5649bfa72fba] marian::ExpressionGraph:: forward (std::__cxx11::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>>&, bool) + 0x21a
[0x5649bfa746ae] marian::ExpressionGraph:: forwardNext () + 0x23e
[0x5649bfad21d4] marian::BeamSearch:: search (std::shared_ptr<marian::ExpressionGraph>, std::shared_ptr<marian::data::CorpusBatch>) + 0x4664
[0x5649bf96534b] marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}:: operator() (unsigned long) const + 0x13b
[0x5649bf965788] marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1}:: operator() () const + 0x18
[0x5649bf966210] std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::_M_run()::{lambda()#1},void>>:: _M_invoke (std::_Any_data
const&) + 0x20
[0x5649bf9021eb] std::__future_base::_State_baseV2:: _M_do_set (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*, bool*) + 0x1b
[0x7fe2dd650997] + 0xf997
[0x5649bf90761d] std::_Function_handler<void (),marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#3}>:: _M_invoke (std::_Any_data const&) + 0x11d
[0x5649bf906d52] std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>:: _M_run () + 0x132
[0x7fe2c8924b2f] + 0xbbb2f
[0x7fe2dd648fa3] + 0x7fa3
[0x7fe2c86024cf] clone + 0x3f
Can you please post the exact command you used for marian-decoder and the contents of the decoder.yml?
Please find attached the log (which has the command and model config)
decoder.yml:
relative-paths: false
models:
- model.bin
vocabs:
- model/vocab.en.spm
- model/vocab.hi.spm
beam-size: 8
normalize: 0
word-penalty: 0
mini-batch: 128
maxi-batch: 100
maxi-batch-sort: src
Thanks, and the non-quantized versions work?
Yes, the trained model.npz model works perfect.
I debugged a similar issue a while ago.
+++ b/src/tensors/cpu/fbgemm/expression_graph_packable.h
@@ -39,14 +39,18 @@ public:
// int8 - all the weights used for affine op and dot op
// fp16 - all the weights used for affine op
if ((gemmElementType == Type::packed8avx2 || gemmElementType == Type::packed8avx512)
- && (pName.find("_W") == pName.length() - 3 || pName.find("_W") == pName.length() - 2)) {
+ && (pName.find("_W") == pName.length() - 3 || pName.find("_W") == pName.length() - 2)
+ && (pName.find("encoder") == std::string::npos)
+/* && (pName.find("decoder_ff_logit_l") ==std::string::npos)
+ && (pName.find("decoder_ff_state_W") ==std::string::npos)*/
+ && (pName.find("decoder_cell") ==std::string::npos)) {
#if USE_FBGEMM
using namespace marian::cpu::variant;
// packing information - size
int nrow;
int ncol;
uint64_t packsize;
-
+ std::cerr << "here " << pName <<std::endl;
fbgemmPacked8PackInfo(val->shape(),
gemmElementType,
pName.find("Wemb") != std::string::npos,
@@ -153,4 +157,4 @@ public:
}
};
-} // namespace marian
\ No newline at end of file
+} // namespace marian
seemed to fix it, but that was for a single layer model. Might not work for you.
I will take a look. At the least there should be a clean error message if there is no support.
Hi @emjotde
Looks like we didn't add support for the quantized models to the old RNN models
Any updates on this?
Bug description
I am using
marian-conv
to convert the npz model into different bin formats optimized for CPU inference. When decoding the models, I am facing different issues for each model-conversion type. (Yes, I built withUSE_FBGEMM
on, and my CPU supports AVX2)packed16:
packed8avx2:
intgemm16: This one runs, but gives garbage results.
How to reproduce
I trained a sample model using
--type s2s --best-deep
to test quantized CPU inference.Context