vscentrum / vsc-software-stack

Central repository of easyconfigs used in the software installations on VSC clusters.
2 stars 6 forks source link

OpenNMT-py #108

Open boegel opened 1 year ago

boegel commented 1 year ago
boegel commented 1 year ago

Currently stuck on:

/tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc: In member function virtual void onmt::SentencePiece::set_vocabulary(const std::vector<std::__cxx11::basic_string<char> >&, const onmt::Tokenizer::Options*):
/tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc:57:45: error: cannot convert const std::vector<std::__cxx11::basic_string<char> > to const std::vector<std::basic_string_view<char> >&
   57 |     auto status = _processor->SetVocabulary(vocabulary);
      |                                             ^~~~~~~~~~
      |                                             |
      |                                             const std::vector<std::__cxx11::basic_string<char> >
In file included from /tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc:3:
/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/SentencePiece/0.1.97-GCC-11.3.0/include/sentencepiece_processor.h:279:45: note:   initializing argument 1 of virtual sentencepiece::util::Status sentencepiece::SentencePieceProcessor::SetVocabulary(const std::vector<std::basic_string_view<char> >&)
  279 |       const std::vector<absl::string_view> &valid_vocab);
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
make[2]: *** [CMakeFiles/OpenNMTTokenizer.dir/build.make:135: CMakeFiles/OpenNMTTokenizer.dir/src/SentencePiece.cc.o] Error 1
boegel commented 1 year ago

compiler problem for Tokenizer fixed with patch included in 08dcce9

see also https://github.com/OpenNMT/Tokenizer/issues/323

Micket commented 1 year ago

I think you can simplify the patch to just do SetVocabulary(ToPieceArray(vocabulary)); since that's what SentencePiece themselves do in this situation https://github.com/google/sentencepiece/commit/631420b84be518c907060fd947aac01762d7fbb0#diff-77e6a3b3bfda73d84fe1fef8205f2a2ec1d46b8f232100041f7135505f8adcefR217 Function should be defined in the same header that's already included.

It's a bit better since it the vector allocations up front.