pteichman / cobe

A Markov chain based text generation library and MegaHAL style chatbot
http://teichman.org/blog/
MIT License
242 stars 51 forks source link

The rating algorithm really likes this line of text... #12

Open need4seed opened 10 years ago

need4seed commented 10 years ago

"The Jews steal our money through their Zionist occupied government and use the black man to bring drugs into our oppressed white minority communities." I've noticed that if you add this line to the corpus, this line will consistently show up around 85% of the time when the input line is short.

Not sure if this is a bug or just an odd behavior. Any ways, does anyone have any idea why this line is rated so highly?

pteichman commented 10 years ago

Without knowing anything about the rest of your training data, I think this is a combination of:

1) None of the ngrams in that line are in common with other training data, so when candidate replies are generated the whole line tends to come back intact rather than being combined with other inputs.

2) Rare ngrams score higher, and their scores are summed together. So if that line is longer than your other training data (in ngram count--it has ~30), when it does show up it may win on length alone.

The cobe command can be run with a --debug option to show all candidate replies. If you start the interactive console with that (cobe --debug --brain <brain file> console), it will show all the candidates in order of score. Scores of -1 are duplicate replies that you'll also find with a proper score.

That may reassure you that your cobe thinks things that aren't bigoted, it just chooses not to say them.

"On two occasions I have been asked, — 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

need4seed commented 10 years ago

Ah, so what you're saying that if there's a line unique enough, cobe will be more apt to spit back that line (particularly, it seems, when the input line is smaller)?

(Oh, and not to worry, it's a Seinfeld quote.)

CrazyPython commented 8 years ago

@need4seed Are you making a SeinfieldBot?