I can't find in the repository the code used to continue mbart pretraining to create mbarthez. Did you make it available somewhere ?
More specifically, I'm interested in understanding how you adapted the mbart tokenizer. It looks like that the checkpoint on huggingface uses the barthez tokenizer, not the mbart tokenizer. So my question is: how did you align the pretrained mbart embeddings with the barthez tokenizer vocab ?
Hello @moussaKam ,
I can't find in the repository the code used to continue mbart pretraining to create mbarthez. Did you make it available somewhere ?
More specifically, I'm interested in understanding how you adapted the mbart tokenizer. It looks like that the checkpoint on huggingface uses the barthez tokenizer, not the mbart tokenizer. So my question is: how did you align the pretrained mbart embeddings with the barthez tokenizer vocab ?