Open kjappelbaum opened 5 months ago
in Meta's paper (/cc @smiret-intel)
For building the tokenizer we can do two routes:
the second approach will limit generalizability, the first will give a very large vocab. Are there any other things that come to mind that we should consider, @smiret-intel , @n0w0f ?
I am lookin at Regression Transfomer tokenizer implementation in this branch.
Pros:
Cons:
in https://arxiv.org/pdf/2305.05708.pdf