headchem / StoryGhostPlotter

0 stars 0 forks source link

Finetune new models with new Sequence structure #23

Closed headchem closed 2 years ago

headchem commented 2 years ago

Since switching to the new Sequence-based approach, the prompts are all broken. - DONE

Update "public static string GetPrompt(Plot req)" to handle new CompletionType of sequence names instead of orphanSummary, orphanFull, etc... - DONE

Generate new finetuning datasets based on data from Cosmos db once #19 is complete. Admin-only ability to flag a Plot as "includeInFinetuning:true" for the Cosmos query. - DECIDED TO USE MY USERNAME AS "FLAG"

Dogfood by adding another 5 training data points. Try out other data sources like Cliffnotes. Keep an eye on the new sequence advice entries, and make more generic if necessary.

retrain all models on new "instruct" (sanity check first one before committing) - CANNOT, Instruct is itself already fine-tuned? Seems I can only fine-tune the classic models, including being stuck with the 1024 token limits...

--- NEW "Edits" model released which lets you complete between two prompts that bookend the completion. When generating forward-only, use the finetuned models. Once other Sequences have values, re-generating a Sequence that is sandwiched between existing text, use the new "Edits" model instead. This solves the branching timelines problem where you're stuck with some event early on, because regenerating would invalidate all later events. This is tracked in more detail by #45 .

Use the new "Edit" model to ensure specified keywords and genre are present in the log line. We can use the existing prompt to generate a good starting point, then chain that into the Edit model, and instruct it to add missing concepts (only if not already present) and ensure the text is sufficiently related to the genre, etc. - DONE

headchem commented 2 years ago

In progress. The following models have been trained so far:

headchem commented 2 years ago

Completed all remaining Sequence models. From email:

I just finished training all of the models to generate each “sequence” of the story, according to the story framework. This is using the test characters we created a few weeks ago. Everything here is 99% AI-generated. The remaining 1% was mostly me switching character names around, because the AI would sometimes forget who was in previous scenes. The story was built up by running 2-7 generations, then I would cherry-pick the best one before continuing on to the next sequence. This is in line with what I’ve heard others say about GPT-3 – that previous models would take several dozen tries before something usable came up, while this new model gives usable text with a single-digit number of tries. My training data includes 35 stories/movie plots, and Open AI say that output quality roughly doubles when doubling the training data. It’s not that great of a final story, and definitely strayed a bit from the log line, but I’m feeling like it shows enough promise to justify continuing expanding the training data. The Western theme resulted in the classic “white men” vs “Indian braves” so GPT-3’s training data cultural biases are on full display.

Genres: Comedy, Mystery Keywords: boat, cowboy, bananas Log Line: A boat full of tourists is kidnapped by a band of cowboys and held hostage on a remote island. The pressure mounts as the group of strangers must deal with personalities and egos that clash. In the midst of it all, they must also figure out how to get back home. Problem Template: Buddy Love Dramatic Question: Bravery

Protagonist: Explorer-type Jamie is an imaginative brainstormer, disciplined and industrious, extroverted and gung-ho, empathetic and compassionate, unflappable and relaxed. He is a wannabe adventurer who fantasizes about exploring exotic lands, even though he is unwilling to leave his comfort zone. He has been at the same desk for years, drawing caricatures for parties, but wants to pursue his dream of exploring the jungle.

Jester-type John David is a closeminded fuddy duddy, spontaneous and sloppy, introverted and submissive, cold and unfeeling, anxious and vulnerable. He is apathetic and submissive, ashamed of his past failure to be promoted at work due to his boss's preferential treatment of those who look similar to the boss.

Wild horses run through the desert.

The Western Frontier is overrun with cowboys, who have been leading cattle on the California Trail. The cowboys are shown to be respectful, but are not above terrorizing travelers. Jamie is a talented caricaturist and is hired to entertain a group of travelers who arrive at a Western inn. The reckless gambler Michael O'Dowd is unimpressed with the sketches and refuses to pay, angering Jamie.

During an intense confrontation, O'Dowd tells Jamie, "Only a fool lets intimidation stop him," hinting at the theme of bravery.

Jamie loses his temper and throws O'Dowd out of the hotel.

Jamie's boss laughs, saying that Jamie has "Custer's disease," named after the man who lost his temper. Embarrassed, Jamie leaves. Jamie follows a woman he met, Katie, to the local saloon and learns that she is part of a wagon train heading to California. Jamie wants to sign on and see the world. Katie's fiancé, John David, repeatedly tries to talk her out of going and reveals that he works for the railroad. John David is cold and distant toward Jamie, causing him to wonder if he's doing the right thing by pursuing Katie.

Jamie's B-Story is his relationship with Katie as he learns to be brave when pursuing an idea.

Jamie signs up as part of the wagon train.

Jamie becomes close with Katie, telling her about his dreams of adventure and revealing his childish tendency to keep dogs as pets. Katie warns him that the wagon train is dangerous. John David sees Katie and Jamie spending time together and warns her not to trust Jamie. John David gets a job at a local mine and Katie convinces him to stay at the mine, allowing her and Jamie to spend time together. The reckless O'Dowd joins the wagon train. Meanwhile, the rest of the group feels safe because of the presence of Calamity Jane, the legendary scout known for her ability to scare off bandits. But the seemingly tough front masks Jane's true identity: a sensitive woman who longs for adventure. Jamie stops Katie from wandering off alone, but John David is deeply offended. Before they can resolve their dispute, they are confronted by an angry war party led by Crazy Horse. Calamity Jane easily scares off the warriors, but Katie and Jamie are taken hostage. O'Dowd offers to help exchange them for the return of $125 that he lost in a poker game.

The Native Americans set up camp and Jamie begins to bond with Katie. Katie and Jamie agree to each other that they should confess their feelings once they reach California. Katie gives Jamie a watch so that he can keep their promise to meet.

That night, the warriors prepare Jamie and Katie for their death, beginning a ceremony while the white men hide and watch. Jamie sneaks over to Katie and the two launch into fighting back against their captors. The warriors are shown to be close to breaking, but John David shows up in a horse drawn carriage. He offers to trade his cattle in exchange for Katie but he is ambushed as O'Dowd's ruse is exposed. The braves give Jamie and Katie an ultimatum: surrender or die.

Jamie and Katie surrender themselves. The braves set a curse upon the white people and begin to kill John David's cattle. In captivity, Jamie and Katie's feelings for each other are no longer a secret as they share a passionate kiss. An enraged John David tells Katie that he will marry her. She rejects John David and the death curse is lifted. The braves plan to burn the hostages alive at dawn, and John David comes to his senses and makes a run for it on foot.

Watching her fiancé flee, Katie looks to Jamie and mouthing "I'm sorry," she runs off after John David. The braves run after Katie and as Jamie tries to save her, he is captured as well. John David has made it to a riverbank but Katie keeps running after him, despite his hesitance to be helped. Katie sacrifices herself by pushing John David into the river, saving his life.

Jamie arrives and shoves the surviving Katie back into the water to keep her safe. John David and Katie swim to safety while Jamie watches helplessly as the wagon train arrives, followed by the warriors. All of John David's cattle have been killed, save one cow and a calf.

Misery confronts Jamie with Calamity Jane as they set up a trap to kill the braves. Calamity Jane tells Jamie they aren't at their best but he counters that they are being brave because they are not surrendering. John David and Katie arrive and offer to help in any way they can. The braves arrive and Jamie takes the last calf with him as he separates from the group and goes back to find Katie. When the warriors chase him down, Jamie sacrifices himself by shooting his gun but the gun is empty. The braves stop to laugh at how silly he is but when they turn around they are surrounded by cowboys that were hidden from view.

Katie and John David are married and Jamie leaves in search of adventure and a new life. Jamie heads west with Calamity Jane.

headchem commented 2 years ago

NEW APPROACH!

Instead of fine-tuning a dedicated model per task like you would in traditional ML, you can instead train a single model with multiple prompt separators. OpenAI recommends a prompt separator of "->" but apparently you can use any natural language you want. Currently, I have 17 different models, which is a hassle to update and maintain. Instead create a fine-tuning training file like:

"prompt language for SETUP:":"ideal completion"
"prompt language for MIDPOINT:":"ideal completion"
"prompt language for OPENING IMAGE:":"ideal completion"
"prompt language for ALL HOPE IS LOST:":"ideal completion"

This would allow a single model to "see" many more domain-specific examples, in my case "story-like" language, even though that language might belong to many types of tasks. Instead of a dedicated model for "SETUP" I could mix those rows in alongside of "ALL HOPE IS LOST" and the model can use the prompt separator (just the last part of the prompt) as a flag to switch between task types

With my 35 training stories, that turns from a single model seeing 35 data points, to a single model seeing ~500 data points of story-like language. Supposedly, it'll be smart enough to pick up on the task switching.

Be sure to create a new git branch before making a big change like this...

headchem commented 2 years ago

This issue is complete with #46. I trained with the following hyper params, and it appears to have worked well!

openai api fine_tunes.create -t "ALL.jsonl" -m davinci --n_epochs 2 --learning_rate_multiplier 0.07