Open yananchen1989 opened 2 years ago
Hello, I find that some words are cased while some are uncased. They have different word ids in the vocab of tokenizer of GPT.
What is the appropriate way to process the words ? Thanks.
Seems like there's no other better way to solve this, unless you include them all in bag of words.
Hello, I find that some words are cased while some are uncased. They have different word ids in the vocab of tokenizer of GPT.
What is the appropriate way to process the words ? Thanks.