tokenize and apply bpe for one sentence

Ravikiran2611 commented 4 years ago

can you provide a solution for this issue https://github.com/facebookresearch/LASER/issues/95

yannvgn commented 4 years ago

Well if your goal is to get the BPE-encoded version of your sentence, you could do like this, with laserembeddings:

from laserembeddings import Laser
from laserembeddings.preprocessing import Tokenizer, BPE

tokenizer = Tokenizer('en')
bpe = BPE(Laser.DEFAULT_BPE_CODES_FILE, Laser.DEFAULT_BPE_VOCAB_FILE)

bpe.encode_tokens(tokenizer.tokenize('He is inclined to be lazy.'))
# he is in@@ clin@@ ed to be la@@ zy .

But that's not really the point of this package.

Also note that for some languages, in some cases you might get slightly different results than with Facebook's original implementation. Please refer to the readme.

Ravikiran2611 commented 4 years ago

got it thanks !!!!!!!!!!! @yannvgn

yannvgn / laserembeddings

tokenize and apply bpe for one sentence #5