Closed Ravikiran2611 closed 4 years ago
Well if your goal is to get the BPE-encoded version of your sentence, you could do like this, with laserembeddings
:
from laserembeddings import Laser
from laserembeddings.preprocessing import Tokenizer, BPE
tokenizer = Tokenizer('en')
bpe = BPE(Laser.DEFAULT_BPE_CODES_FILE, Laser.DEFAULT_BPE_VOCAB_FILE)
bpe.encode_tokens(tokenizer.tokenize('He is inclined to be lazy.'))
# he is in@@ clin@@ ed to be la@@ zy .
But that's not really the point of this package.
Also note that for some languages, in some cases you might get slightly different results than with Facebook's original implementation. Please refer to the readme.
got it thanks !!!!!!!!!!! @yannvgn
can you provide a solution for this issue https://github.com/facebookresearch/LASER/issues/95