Train reverse admin helper models

headchem commented 2 years ago

PROBLEM: summarizing each scene is time consuming. It takes multiple days to make it through a single screenplay.

SOLUTION: using the summarized screenplays of Big Fish and CODA (and maybe Fight Club?), finetune a model that goes from a prompt of: anonymized full screenplay scene -> ideal completion: anonymized scene summary

Create UI where I can paste in the Fountain full screenplay with Sequence and Scene dividers, and it will automatically split into the various sections in the UI. Then with the fully trained helper model described below, I can click "brainstorm" to generate some scene summaries to copy/edit. Might as well add a Brainstorm All button which fills in the first brainstorm for each Scene, so I can more quickly scan through and make edits.

DONE - given a training data full screenplay scene, use C# spaCy to extract the character names, and replace them with CHARACTER1, CHARACTER2...
DONE - given the corresponding training data scene summary, replace character names with the extracted CHARACTER1, CHARACTER2, etc from above
train a finetuned model
For inference, I first do the character extraction/anonymization step on a full scene
send the anonymized full scene to the model for inference
DONE - get back anonymized scene summary, then replace CHARACTER1, CHARACTER2 with the original character names extracted before

I can start on scaffolding the UI without having the finetuned model yet by hardcoding a response of "TODO response here for CHARACTER0, CHARACTER1, CHARACTER2" based on how many detected characters there are. Then confirm the full round trip keeps track of all the names.

headchem commented 2 years ago

I have 2 full stories with scenes, now code the finetune data functions for:

screenplay scene -> scene summary (have ~200 scenes so far)
concatenated scene summaries -> expanded summary (only have ~30 examples, use above trained model and add at least 2 more full screenplays before training this model)
expanded summary -> blurb (have ~500 examples, but should wait before training because I may drastically update these once I have full scenes to incorporate)

headchem commented 2 years ago

This has been completed and deployed - only has 2 full screenplays (CODA, Big Fish) so far, but it's already working well enough to use. After I've added more full screenplays using this new tool, I'll retrain the models with the expanded data to further increase quality.

headchem / StoryGhostPlotter

Train reverse admin helper models #101