Open cs-jlopezr opened 7 months ago
I was able to solve the issue compiling the sentencepiece tokenizer library separately and adding the dependency explicitly. It is not clear in the usage instructions.
And now, Not sure why I am getting a Segmentation fault when using the library. I am just doing the same as in the example. The initialization of the tokenizer is apparently ok but then when I want to encode: segmentation fault!
@cs-jlopezr Could you share the source code and config to reproduce?
I was able to successfully compile the library but when I use it as indicated in the example folder I am having the following errors:
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::SentencePieceProcessor()
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::LoadFromSerializedProto(std::ndk1::basic_string_view<char, std::ndk1::char_traits >)
ld: error: undefined symbol: sentencepiece::util::Status::~Status()
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::~SentencePieceProcessor()
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::Encode(std::ndk1::basic_string_view<char, std::ndk1::char_traits >, std::ndk1::vector<int, std::ndk1::allocator >*) const
ld: error: undefined symbol: sentencepiece::util::Status::IgnoreError()
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::Decode(std::ndk1::vector<int, std::ndk1::allocator > const&, std::ndk1::basic_string<char, std::__ndk1::char_traits, std:: ndk1::allocator >*) const
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::GetPieceSize() const
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::IdToPiece(int) const
ld: error: undefined symbol: sentencepiece::SentencePieceProcessor::PieceToId(std::ndk1::basic_string_view<char, std::ndk1::char_traits >) const
When I check inside the library the symbols are properly defined.
In my code I am just doing the same as in the example folder, so I am not invoking directly the symbols that are not recognized. The ones that I am using (FromBlobSentencePiece, for example) are correctly identified. What could be the error?
One things which is curious for me is: why the compiler of my program is complaining about the src/sentencepiece_tokenizer.cc file if I am just using the static library (the .a file) through the tokenizers_cpp.h file provided by the library?