Closed javithe7 closed 4 years ago
Hi everyone,
Does anyone know which tokenizer Multifit uses?(especially in spanish texts), as well as the method used to vectorize them. I'd like to be able to tokenize and vectorize texts in the same way that multifit does internally.
Hello @javithe7 we use sentencepiece tokenization, which has been added to fastai directly. You can check the documentation at https://docs.fast.ai/text.data.html#TextList
Hi everyone,
Does anyone know which tokenizer Multifit uses?(especially in spanish texts), as well as the method used to vectorize them. I'd like to be able to tokenize and vectorize texts in the same way that multifit does internally.