marian-nmt / marian-dev

Fast Neural Machine Translation in C++ - development repository
https://marian-nmt.github.io
Other
255 stars 125 forks source link

Unable to use quantized s2s models on CPU using marian-conv #832

Open GokulNC opened 3 years ago

GokulNC commented 3 years ago

Bug description

I am using marian-conv to convert the npz model into different bin formats optimized for CPU inference. When decoding the models, I am facing different issues for each model-conversion type. (Yes, I built with USE_FBGEMM on, and my CPU supports AVX2)

packed16:

[2021-03-11 16:42:42] Error: Child 1 has different type (first: float32 != child: packed16)
[2021-03-11 16:42:42] Error: Aborted from static marian::Type marian::NaryNodeOp::commonType(const std::vector<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > > >&) in /home/gokul/marian/src/graph/node.h:209

[CALL STACK]
[0x55b332962f2a]    marian::NaryNodeOp::  commonType  (std::vector<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>> const&) + 0x29a
[0x55b3329a1d00]    IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>> marian::  Expression  <marian::ConcatenateNodeOp,std::vector<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>> const&,int&>(std::vector<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>> const&,  int&) + 0x40
[0x55b33290be66]    marian::  concatenate  (std::vector<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>> const&,  int) + 0x16
[0x55b332a95faf]    marian::rnn::GRU::  GRU  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::Options>) + 0x63f
[0x55b332a96dc8]    std::shared_ptr<marian::rnn::GRU> marian::  New  <marian::rnn::GRU,std::shared_ptr<marian::ExpressionGraph>&,std::shared_ptr<marian::Options>&>(std::shared_ptr<marian::ExpressionGraph>&,  std::shared_ptr<marian::Options>&) + 0x78
[0x55b332ab040d]    marian::rnn::CellFactory::  construct  (std::shared_ptr<marian::ExpressionGraph>) + 0x5d
[0x55b332a8e33d]    marian::rnn::StackedCellFactory::  construct  (std::shared_ptr<marian::ExpressionGraph>) + 0x27d
[0x55b332a8f845]    marian::rnn::RNNFactory::  construct  (std::shared_ptr<marian::ExpressionGraph>) + 0x3a5
[0x55b332ac02a8]    marian::EncoderS2S::  applyEncoderRNN  (std::shared_ptr<marian::ExpressionGraph>,  IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,  IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,  std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>>) + 0x38d8
[0x55b332ac4339]    marian::EncoderS2S::  build  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x189
[0x55b332af5fad]    marian::EncoderDecoder::  startState  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0xad
[0x55b332a439a8]    marian::models::Stepwise::  startState  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x68
[0x55b332802c1d]    marian::ScorerWrapper::  startState  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0xcd
[0x55b3327e8178]    marian::BeamSearch::  search  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x608
[0x55b33267f34b]    marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}::  operator()  (unsigned long) const + 0x13b
[0x55b33267f788]    marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1}::  operator()  () const + 0x18
[0x55b332680210]    std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::_M_run()::{lambda()#1},void>>::  _M_invoke  (std::_Any_data const&) + 0x20
[0x55b33261c1eb]    std::__future_base::_State_baseV2::  _M_do_set  (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*,  bool*) + 0x1b
[0x7f41ee68c997]                                                       + 0xf997
[0x55b33262161d]    std::_Function_handler<void (),marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#3}>::  _M_invoke  (std::_Any_data const&) + 0x11d
[0x55b332620d52]    std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>::  _M_run  () + 0x132
[0x7f41d9960b2f]                                                       + 0xbbb2f
[0x7f41ee684fa3]                                                       + 0x7fa3
[0x7f41d963e4cf]    clone                                              + 0x3f

packed8avx2:

[2021-03-12 07:00:24] Error: Segmentation fault
[2021-03-12 07:00:24] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /home/gokul/marian/src/common/logging.cpp:130

[CALL STACK]
[0x560179161f6e]                                                       + 0x40ff6e
[0x5601791621d9]                                                       + 0x4101d9
[0x7f1c22990730]                                                       + 0x12730
[0x560179370d9f]    marian::cpu::  Concatenate1  (IntrusivePtr<marian::TensorBase>,  std::vector<IntrusivePtr<marian::TensorBase>,std::allocator<IntrusivePtr<marian::TensorBase>>> const&) + 0x60f
[0x5601793749e3]    marian::cpu::  Concatenate  (IntrusivePtr<marian::TensorBase>,  std::vector<IntrusivePtr<marian::TensorBase>,std::allocator<IntrusivePtr<marian::TensorBase>>> const&,  int) + 0x73
[0x5601793e96e5]    marian::ConcatenateNodeOp::  forward  ()           + 0x175
[0x560179224fba]    marian::ExpressionGraph::  forward  (std::__cxx11::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>>&,  bool) + 0x21a
[0x5601792266ae]    marian::ExpressionGraph::  forwardNext  ()         + 0x23e
[0x5601792841d4]    marian::BeamSearch::  search  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x4664
[0x56017911734b]    marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}::  operator()  (unsigned long) const + 0x13b
[0x560179117788]    marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1}::  operator()  () const + 0x18
[0x560179118210]    std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::_M_run()::{lambda()#1},void>>::  _M_invoke  (std::_Any_data const&) + 0x20
[0x5601790b41eb]    std::__future_base::_State_baseV2::  _M_do_set  (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*,  bool*) + 0x1b
[0x7f1c2298d997]                                                       + 0xf997
[0x5601790b961d]    std::_Function_handler<void (),marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#3}>::  _M_invoke  (std::_Any_data const&) + 0x11d
[0x5601790b8d52]    std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>::  _M_run  () + 0x132
[0x7f1c0dc61b2f]                                                       + 0xbbb2f
[0x7f1c22985fa3]                                                       + 0x7fa3
[0x7f1c0d93f4cf]    clone                                              + 0x3f

intgemm16: This one runs, but gives garbage results.

How to reproduce

I trained a sample model using --type s2s --best-deep to test quantized CPU inference.

Context

emjotde commented 3 years ago

Looks like we didn't add support for the quantized models to the old RNN models? How are you calling the decoder?

GokulNC commented 3 years ago

@emjotde I am calling it using marian-server and the -c decoder.yml config after replacing the model name with the converted model.bin.

Also I just tried using marian-decoder too (using intgemm8 model), it seems to be throwing a different error:

[2021-03-12 06:24:18] Error: Segmentation fault
[2021-03-12 06:24:18] Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /home/gokul/marian/src/common/logging.cpp:130               

[CALL STACK]
[0x5649bf9aff6e]                                                       + 0x40ff6e                                                                             
[0x5649bf9b01d9]                                                       + 0x4101d9                                                                             [0x7fe2dd653730]                                                       + 0x12730                                                                              
[0x5649bfbbed9f]    marian::cpu::  Concatenate1  (IntrusivePtr<marian::TensorBase>,  std::vector<IntrusivePtr<marian::TensorBase>,std::allocator<IntrusivePtr<marian::TensorBase>>> const&) + 0x60f
[0x5649bfbc29e3]    marian::cpu::  Concatenate  (IntrusivePtr<marian::TensorBase>,  std::vector<IntrusivePtr<marian::TensorBase>,std::allocator<IntrusivePtr<marian::TensorBase>>> const&,  int) + 0x73
[0x5649bfc376e5]    marian::ConcatenateNodeOp::  forward  ()           + 0x175
[0x5649bfa72fba]    marian::ExpressionGraph::  forward  (std::__cxx11::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>,std::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase>>>>>&,  bool) + 0x21a                                                                            
[0x5649bfa746ae]    marian::ExpressionGraph::  forwardNext  ()         + 0x23e
[0x5649bfad21d4]    marian::BeamSearch::  search  (std::shared_ptr<marian::ExpressionGraph>,  std::shared_ptr<marian::data::CorpusBatch>) + 0x4664            
[0x5649bf96534b]    marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}::  operator()  (unsigned long) const + 0x13b
[0x5649bf965788]    marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1}::  operator()  () const + 0x18                   
[0x5649bf966210]    std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#1},std::allocator<int>,void ()>::_M_run()::{lambda()#1},void>>::  _M_invoke  (std::_Any_data
const&) + 0x20
[0x5649bf9021eb]    std::__future_base::_State_baseV2::  _M_do_set  (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*,  bool*) + 0x1b
[0x7fe2dd650997]                                                       + 0xf997                                                                               
[0x5649bf90761d]    std::_Function_handler<void (),marian::ThreadPool::enqueue<marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&,unsigned long&>(std::result_of&&,(marian::TranslateService<marian::BeamSearch>::run(std::__cxx11::basic_string<char,std::char_traits<char>,std::allocator<char>> const&)::{lambda(unsigned long)#1}&)...)::{lambda()#3}>::  _M_invoke  (std::_Any_data const&) + 0x11d
[0x5649bf906d52]    std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>::  _M_run  () + 0x132
[0x7fe2c8924b2f]                                                       + 0xbbb2f                                                                              
[0x7fe2dd648fa3]                                                       + 0x7fa3                                                                               
[0x7fe2c86024cf]    clone                                              + 0x3f
emjotde commented 3 years ago

Can you please post the exact command you used for marian-decoder and the contents of the decoder.yml?

GokulNC commented 3 years ago

Please find attached the log (which has the command and model config)

server.log

decoder.yml:

relative-paths: false
models:
  - model.bin
vocabs:
  - model/vocab.en.spm
  - model/vocab.hi.spm
beam-size: 8
normalize: 0
word-penalty: 0
mini-batch: 128
maxi-batch: 100
maxi-batch-sort: src
emjotde commented 3 years ago

Thanks, and the non-quantized versions work?

GokulNC commented 3 years ago

Yes, the trained model.npz model works perfect.

XapaJIaMnu commented 3 years ago

I debugged a similar issue a while ago.

+++ b/src/tensors/cpu/fbgemm/expression_graph_packable.h
@@ -39,14 +39,18 @@ public:
       // int8 - all the weights used for affine op and dot op
       // fp16 - all the weights used for affine op
       if ((gemmElementType == Type::packed8avx2 || gemmElementType == Type::packed8avx512)
-        && (pName.find("_W") == pName.length() - 3 || pName.find("_W") == pName.length() - 2)) {
+        && (pName.find("_W") == pName.length() - 3 || pName.find("_W") == pName.length() - 2) 
+       && (pName.find("encoder") == std::string::npos)
+/*     && (pName.find("decoder_ff_logit_l") ==std::string::npos)
+        && (pName.find("decoder_ff_state_W") ==std::string::npos)*/
+       && (pName.find("decoder_cell") ==std::string::npos)) {
 #if USE_FBGEMM
         using namespace marian::cpu::variant;
         // packing information - size
         int nrow;
         int ncol;
         uint64_t packsize;
-
+       std::cerr << "here " << pName <<std::endl;
         fbgemmPacked8PackInfo(val->shape(),
                               gemmElementType,
                               pName.find("Wemb") != std::string::npos,
@@ -153,4 +157,4 @@ public:
   }
 };

-}  // namespace marian
\ No newline at end of file
+}  // namespace marian

seemed to fix it, but that was for a single layer model. Might not work for you.

emjotde commented 3 years ago

I will take a look. At the least there should be a clean error message if there is no support.

GokulNC commented 3 years ago

Hi @emjotde

Looks like we didn't add support for the quantized models to the old RNN models

Any updates on this?