Open iedmrc opened 4 years ago
Which methods does aitextgen support for tokenization (BPE, wordpiece etc..)? If only one, how can we expand to use others?
Thanks!
Custom tokenizers just use BPE, since that is what GPT-2 uses.
In theory you could use arbitrary tokenizers but I haven't tested it. Not sure if there is an ROI for doing so, and if GPT-2 is OK with that.
Which methods does aitextgen support for tokenization (BPE, wordpiece etc..)? If only one, how can we expand to use others?
Thanks!