headchem / StoryGhostPlotter

0 stars 0 forks source link

Add more data, consider adding //n to jsonl #93

Open headchem opened 2 years ago

headchem commented 2 years ago

With new expanded summaries feature complete, try it out with the longer Save The Cat articles that I skipped the first time for being too long.

Reconsider the json structured prompt, or at least add in line breaks. I think the labels like HERO: asdfsd \n\n HEADING: asdasd would be helpful to the model to parse the structure better. The double line break actually contains useful structural information the model can benefit from.

Maybe post question on OpenAI forums about how to handle line breaks? Also ask if people think it would improve finetuning?

headchem commented 2 years ago

Give example when asking question on OpenAI forums:

{ "prompt": "GENRES: Family, Adventure, Action, Drama. KEYWORDS: mistreats, hunter, wolf, yukon, rescues, friendship ->", "completion": " Jack London's classic adventure story about the friendship developed between a Yukon gold hunter and the mixed dog-wolf he rescues from the hands of a man who mistreats him.###"} { "prompt": "GENRES: Action, Adventure, Thriller. KEYWORDS: espionage, IA, agent, mole, ferret ->", "completion": " When Ethan Hunt, the leader of a crack espionage team whose perilous operation has gone awry with no explanation, discovers that a mole has penetrated the CIA, he's surprised to learn that he's the No. 1 suspect. To clear his name, Hunt now must ferret out the real double agent and, in the process, even the score.###"}

{ "prompt": "GENRES: Family, Adventure, Action, Drama.\nKEYWORDS: mistreats, hunter, wolf, yukon, rescues, friendship/n->", "completion": " Jack London's classic adventure story about the friendship developed between a Yukon gold hunter and the mixed dog-wolf he rescues from the hands of a man who mistreats him.###"} { "prompt": "GENRES: Action, Adventure, Thriller.\nKEYWORDS: espionage, IA, agent, mole, ferret\n->", "completion": " When Ethan Hunt, the leader of a crack espionage team whose perilous operation has gone awry with no explanation, discovers that a mole has penetrated the CIA, he's surprised to learn that he's the No. 1 suspect. To clear his name, Hunt now must ferret out the real double agent and, in the process, even the score.###"}

GENRES: Family, Adventure, Action, Drama. KEYWORDS: mistreats, hunter, wolf, yukon, rescues, friendship ->

GENRES: Action, Adventure, Thriller. KEYWORDS: espionage, IA, agent, mole, ferret ->

It feels like the line breaks contain useful information to separate each prompt section (GENRES, vs KEYWORDS).

headchem commented 2 years ago

ANSWER: Found example on OpenAI finetuning documentation that shows proper use of \n:

{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}
{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}