issues
search
tinkerhub
/
malayalam-llm
A malayalam large language model finetuned on top of open source models
2
stars
1
forks
source link
Train sentencepiece tokenizer on larger text
#1
Open
gksoriginals
opened
10 months ago
gksoriginals
commented
10 months ago
Refer this
colab
for building a sentencepiece tokenizer.
Evaluate the tokenizer using token fertility based on
this paper
Refer this colab for building a sentencepiece tokenizer.