Doc the BOW approach - Githubissues

uber-research / PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

Apache License 2.0

1.13k stars 202 forks source link

Doc the BOW approach #3

Open arnicas opened 4 years ago

arnicas commented 4 years ago

Hi, this looks great. I had to look at the code to get some insight into how to do a BOW approach of my own. Maybe you could add a few lines to the readme about that? The paper seems a little light on how the topic words were selected as well, unless I missed that? But awesome work!

dathath commented 4 years ago

Oops. The current draft seems to be missing a link to where we got the wordlists from: https://www.enchantedlearning.com/wordlist/. Will add this back into the paper! Thanks for catching this.

Aside: Right now, the code only allows for words that are 1 BPE token long. Handling multiple tokens would need a few minor changes.

Thanks for the suggestion; yes, I agree it would be a good idea to make it easier to use with your own BoW. Will consider incorporating this!

shamoons commented 4 years ago

So are you saying that all words in a wordlist have to be 1 word only? Not carpenter ant?