Open boegel opened 1 year ago
Currently stuck on:
/tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc: In member function virtual void onmt::SentencePiece::set_vocabulary(const std::vector<std::__cxx11::basic_string<char> >&, const onmt::Tokenizer::Options*):
/tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc:57:45: error: cannot convert const std::vector<std::__cxx11::basic_string<char> > to const std::vector<std::basic_string_view<char> >&
57 | auto status = _processor->SetVocabulary(vocabulary);
| ^~~~~~~~~~
| |
| const std::vector<std::__cxx11::basic_string<char> >
In file included from /tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc:3:
/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/SentencePiece/0.1.97-GCC-11.3.0/include/sentencepiece_processor.h:279:45: note: initializing argument 1 of virtual sentencepiece::util::Status sentencepiece::SentencePieceProcessor::SetVocabulary(const std::vector<std::basic_string_view<char> >&)
279 | const std::vector<absl::string_view> &valid_vocab);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
make[2]: *** [CMakeFiles/OpenNMTTokenizer.dir/build.make:135: CMakeFiles/OpenNMTTokenizer.dir/src/SentencePiece.cc.o] Error 1
compiler problem for Tokenizer
fixed with patch included in 08dcce9
I think you can simplify the patch to just do SetVocabulary(ToPieceArray(vocabulary));
since that's what SentencePiece themselves do in this situation https://github.com/google/sentencepiece/commit/631420b84be518c907060fd947aac01762d7fbb0#diff-77e6a3b3bfda73d84fe1fef8205f2a2ec1d46b8f232100041f7135505f8adcefR217
Function should be defined in the same header that's already included.
It's a bit better since it the vector allocations up front.
foss/2022a
PythonBundle