Open anjali-chadha opened 3 years ago
Hi @anjali-chadha, Unfortunately I am not too familiar with Java.
Regarding Python: We need to load the library _torchtext.so in order for functionalities to be available in python run-time. Importing torchtext would implicitly load this library, but you could do so explicitly as well. Have a look here https://github.com/pytorch/text/blob/760a625f8796293145c7a9bf4d5c710cfb0aabc8/torchtext/__init__.py#L24
Regarding C++: You would need to link your application code with _torchtext.so and python lib. You can find more details here https://github.com/pytorch/text/issues/1255#issuecomment-867821021. Unfortunately, python lib is still required to be linked as we haven't yet got run-time library that doesn't depend on python lib. Although there is a plan to do this, but I cannot say much on the time lines. cc: @mthrok, @hudeven
@anjali-chadha
You can try to load _torchtext.so
manually:
System.load('/Library/Python/3.8/site-packages/torchtext/_torchtext.so');
With DJL 0.12.0, you simple define an environment variable, and DJL will load it to you:
export PYTORCH_EXTRA_LIBRARY_PATH=/Library/Python/3.8/site-packages/torchtext/_torchtext.so
❓ Questions and Help
Description I have a SentencePiece model which I serialized using
sentencepiece_processor
. My end goal is to use this torchscript serialized tokenizer in Java along with DJL Pytorch dependency. I am looking for guidance on how can I import torchtext dependency in Java environment.Steps: 1. Serializing SPM Tokenizer using Torchtext Torchscript Serialized file is saved as 'spm-jit.pt'
2. Deserializing SPM Tokenizer in Python Loading
spm-jit.pt
without importing torchtext fails with the following error.Error
After importing torchtext, I am able to load the tokenizer from torchscript file.
This led me to the conclusion that serialized file has dependency on torchtext for it to load successfully in Java/Python/C++ environment.
Any guidance on how can I use torchtext in Java and/or C++
Thanks!