ramblingjordan / AbBOT-python

MIT License
25 stars 15 forks source link

Generating more varied prompts #8

Closed TheLandfill closed 3 years ago

TheLandfill commented 3 years ago

See here

I wrote some code to generate a random prompt from a Markov process. It's kind of like Mad Libs, except I wrote it specifically so that almost all sentences make sense (e.g. it doesn't account for gender) and that sentences are quite realistic in the sense that a live-in maid is less likely to be the accused than your son's teacher. For example, here are some sample prompts this algorithm generated:

My boyfriend's granddad is trying to have an abortion a week from now.
I am convinced that my boss's granddaughter disobeyed the new law.
I am certain that my employee's daughter has disregarded Texas's law.
I suspect my niece's neighbor has disobeyed Texas's law on abortion.
I have strong evidence that my son's music teacher has violated Texas's new abortion law.
I have evidence that my brother's calculus teacher disobeyed the new abortion law.
I am convinced that my younger sister's boyfriend has disregarded Texas law.
I have reason to believe that my son-in-law's ex disobeyed Texas's recent abortion law.
I believe that my sister's English teacher has disregarded Texas's restrictions on abortion.
My step-son's chemistry teacher had an abortion last weekend.
I am certain that my next-door neighbor disobeyed Texas's abortion restrictions.
I have strong evidence that my daughter's music teacher violated Texas's ban on abortion.
I have reason to believe my brother's neighbor disregarded the law.
I think that my dentist's grandfather disobeyed Texas's law.
My daughter's employee will try to get an abortion next Friday.
I have reason to suspect that my mother-in-law's deacon has violated Texas's abortion restrictions.
I have evidence my ex's grandson has disregarded the ban on abortion.

Right now, it's only using two patterns:

I (think) [that] my [possessive adjective] (relationship) (violated) (law).
My [possessive adjective] (relationship) (got) an abortion [time].

where (think) refers to a mandatory verb that can replace think and [time] refers to an optional point in time (e.g. next Friday). Even with just these two patterns, I'm getting around 800,000 unique prompts out of 2,000,000 generated prompts. If we added more patterns and more options, we could get even more prompts. To extend what I have done, though, we have to use either short or realistic phrases. "I have reason to believe" is a common phrase that filtering it out could remove real submissions. Furthermore, we should try to make sure we have a large number of unique sentences before we string them together. For example, if I have 10^27 possible first sentences and 5 possible second sentences, you could filter out any submissions with the 5 possible second sentences.

I think for now, we can use this code as a prompt for GPT or just use it as the submission message every so often.

erikedlund commented 3 years ago

This looks excellent. If you'd like to integrate this into the prompt generation logic in data.py that'd be cool!