uber-research / PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
Apache License 2.0
1.13k stars 202 forks source link

Are the bag of words case-sensitive? #42

Open yananchen1989 opened 2 years ago

yananchen1989 commented 2 years ago

Hello, I find that some words are cased while some are uncased. They have different word ids in the vocab of tokenizer of GPT.

What is the appropriate way to process the words ? Thanks.

image

kizunasunhy commented 1 year ago

Seems like there's no other better way to solve this, unless you include them all in bag of words.