Screenplay import process:
Create a new plot
Fill out log line
Run the VS Code Fountain analyzer to get the most important characters, enter the characters
Add scene groupings with optional summary (may be empty like "= ")
Add Sequence headings (example: "# COOLDOWN")
Copy/paste the screenplay, with scene groupings and Sequence headings, then in the Scenes tab, click Import Scenes to populate each text area
Write/generate scene summaries, if not already imported
Write/generate sequence expanded
Write/generate sequence blurbs
Git process:
git checkout -b feat-my-cool-feature
git commit -m "your message"
git push origin feat-brainstorm --- this pushes the branch to remote, but doesn't set the upstream as the default because it's a short-lived branch
Either:
in the browser, navigate to GitHub -> Pull Requests -> New Pull Request
change the second dropdown so that the arrow points from the new branch on the right into the "main" branch on the left
OR
On the repo main page, click "Compare & pull request" which is a shortcut to the above manual approach
Click "Create pull request" and fill out the details, then click "Create pull request"
This immediately kicks off a GitHub Action to deploy a new Azure environment for this branch. Optionally test out on the live site by going to the Azure portal, navigate to the Static Web App -> Environments. You should see a new item until "Preview Deployments" with your branch name. NOTE: to log in, you will need to add the newly generated URL to the authentication providers as a redirect URL.
Click "Merge pull request" fill out any comments, then "Confirm merge"
This immediately kicks off a GitHub action that DELETES the "Preview Deployment" from Azure
Click the button "Delete branch" on the success message
This immediately kicks off a GitHub Action to deploy this to PROD Azure - confirm this, why would deleting a branch kick this off?
In the Terminal, do some clean up:
git switch main
git pull --- brings down changes, including the newly merged branch
git branch -d feat-my-cool-feature --- this deletes the local branch, which catches it up to GitHub's branches
TODO: Tags - watch IAmTimCorey video "Intro to GitHub" minute 57
Known working with VS Code plugin "Azure Static Web Apps v0.9.1" upgrading to v0.10.0 broke the local react dev server for debugging
Debug -> Attach to .NET Functions -> click play button
npm run build
cd .\build\
swa start
navigate to http://localhost:4280
-change code-
npm run build
swa start
-refresh-
You can only install one version of Azure Function Runtime per computer, so may have to switch back to v4 here:
https://github.com/Azure/azure-functions-core-tools/releases (download the x64 .msi)
Unclear if below is needed... To fix Functions version mismatch with 3 is installed, run the following:
npm i -g azure-functions-core-tools@4 --unsafe-perm true
LOCALHOST
In VSCode menu "Run and Debug"
Debug "Attach to .NET Functions"
Once it starts, then select "SWA: Run..." and Debug
to build for prod and see if there are any linting error before deploying, run the following:
npm run build
Workflow for new work:
git checkout -b my-new-feature
... do code changes ...
git add .
git commit -m "your message"
git push origin my-new-feature
If a staging env already exists, this immediately kicks off the deployment pipeline for this branch (I think below steps are required only when pushing a new branch for the first time?)
... in browser go to GitHub, click "Compare & pull request" button, then assuming no conflicts, click other button to "Create Pull Request". This kicks off a GitHub action that will deploys new site from the Azure portal under "Environments"
... if all looks good, back in GitHub, click the "Merge pull request" button to merge the pull request into the main branch to deploy the changes to PROD. You can delete the feature branch once it has been merged.
... after the merge and deployment to prod completes, the GitHub action automatically deletes the static web app of the feature branch
git checkout main
git pull
Use Microsoft for login, with this as "User's roles"
anonymous
authenticated
customer
admin
"id": "43de282f30cf52c2ba73f71a4f28712a",
"displayName": "testuser1",
LOCALHOST
"id": "ef1494647e3f4fe69890dfb8b41431a1",
"displayName": "jdparsons.dev@gmail.com",
AAD
"id": "f98f654a-f5fb-4a33-84d3-2498b8d4d348",
"displayName": "jdparsons.dev@gmail.com",
GOOGLE
"id": "6ba6b68eb0294af39340f256ac0bea3d",
"displayName": "headchem@gmail.com",
ROLES:
authenticated
anonymous
customer
admin
FINETUNING
* from Google Sheets, export file as csv
* Go to Admin page, click the button
* For each textarea, copy and paste into Notepad++ and save as .jsonl
* Open GIT BASH
* export OPENAI_API_KEY="get_key_from_OpenAI_portal"
* Run this tool as a sanity check on data formatting
** openai tools fine_tunes.prepare_data -f "logline.jsonl"
* To kick off a finetune job:
** SUMMARY: openai api fine_tunes.create -t "sg_finetune\orphanSummary.jsonl" -m davinci --n_epochs 3 --learning_rate_multiplier 0.03
** FULL: openai api fine_tunes.create -t "sg_finetune\orphanFull.jsonl" -m davinci --n_epochs 3 --learning_rate_multiplier 0.035
** LOGLINE: openai api fine_tunes.create -t "logline.jsonl" -m babbage --n_epochs 2 --learning_rate_multiplier 0.02
----- LEFT OFF: try another finetune with the upper limit of learning rate, same epochs
NEW RUN:
openai api fine_tunes.create -t "logline.jsonl" -m babbage --n_epochs 2 --batch_size 64 --learning_rate_multiplier 0.1
openai api fine_tunes.create -t "logline.jsonl" -m babbage --n_epochs 1 --batch_size 64 --learning_rate_multiplier 0.2
openai api fine_tunes.create -t "logline.jsonl" -m curie --n_epochs 1 --batch_size 64 --learning_rate_multiplier 0.02
ABOVE did not stay on topic... maybe it didn't learn enough
openai api fine_tunes.create -t "logline.jsonl" -m curie --n_epochs 2 --batch_size 64 --learning_rate_multiplier 0.08
** CHARACTERS: openai api fine_tunes.create -t "sg_finetune\characters.jsonl" -m davinci --n_epochs 3 --learning_rate_multiplier 0.035
** SEQUENCES:
openai api fine_tunes.create -t "OpeningImage.jsonl" -m davinci --n_epochs 3 --learning_rate_multiplier 0.04
*** "Using Lower learning rate and only 1-2 epochs tends to work better for these use cases"
*** "Aim for at least ~500 examples"
*** default n_epochs=4, default learning_rate_multiplier=0.05
*** experiment with values in the range 0.02 to 0.2 to see what produces the best results
*** you'll get a response like: Created fine-tune: ft-aySH26zbI46aMKvL5OxWQJ4h
*** if disconnected, run: openai api fine_tunes.follow -i ft-aySH26zbI46aMKvL5OxWQJ4h
* in the OpenAI portal, you'll see under "Fine-tune training" a model name like "davinci:ft-personal-2022-01-07-04-27-42" Plug this value into the dictionary in Generate.cs
* When calling via Postman at about 10 min after the finetune job reported success, I initially got an HTTP 429 response (Too Many Requests)
** After 15 min, the request succeeded. So, the C# needs to check for HTTP 429, and implement a message back to the user to wait a bit and retry.
* “A Curie fine-tuned on 100 examples may have similar results to a Babbage fine-tuned on 2,000 examples. The larger models can do remarkable things with very little data.” - https://bdtechtalks.com/2021/11/29/gpt-3-application-development-tips/
* The documentation also states for conditional generation: "aim for at least ~500 examples" and "Using Lower learning rate and only 1-2 epochs tends to work better for these use cases"
* My prompts all start with "Here is a summary of an award winning story: " When fine tuning, is that style of prompt still useful to start every row of example data? ANSWER: with a small number of examples, the repeated prompt language is useful, but as I get closer to 100 examples, it may no longer be necessary
openai api fine_tunes.create -t "blurbs.jsonl" -m davinci --n_epochs 2 --learning_rate_multiplier 0.04
* DELETE A MODEL: openai api models.delete -i davinci:ft-personal-2022-04-08-23-41-48
IDEAS:
* Hype/marketing - every time someone completes a full story, update a global NoSQL container counter. This metric is more valuable than anything from Google Analytics for measuring success
* After GPT-3 fills a page with an idea, now encourage the author to make tweaks like a co-author brainstorm. Human needs to be part of the process like an editor forging the ideas into something even better. AI can still assist with this process, ex: "Prompt: Given the previous sequence of events, we see the following symbolism is present. " Encourage the author to layer in more theme/symbolism/nuance.
* We can use DaVinci (highest quality) to generate yet more unique training samples for later finetuning. Human curated from examples to pick out "good" stories.
* Larger models require less data for fine-tuning. https://thenextweb.com/news/building-apps-gpt-3-what-devs-need-know-cost-performance-syndication
** “For many tasks, you can think of increasing base model size as a way to reduce how much data you’ll need to fine-tune a quality model,” Shumer said. “A Curie fine-tuned on 100 examples may have similar results to a Babbage fine-tuned on 2,000 examples. The larger models can do remarkable things with very little data.”
** some tasks (i.e., multi-step generation) are too complex for a vanilla model, even Davinci, to complete with high accuracy,” Shumer said. “In cases like this, you have two options: 1) create a prompt chain that feeds outputs from one prompt into another prompt, or 2) fine-tune a model. I typically first try to create a prompt chain, and if that doesn’t work, I then move to fine-tuning.”
* Big picture direction of generative media: https://arr.am/2020/09/15/the-generative-age/
* at each completion, we can use the GPT-3 Intents model for additional stylistic editing, like "Prompt: . Write a more exciting and dramatic version of these events." OR "make this sequence of events more romantic/scifi/magical/humorous"
TODO:
* (ongoing) continue adding to finetuning dataset
* add text area input length limit to avoid malicious long inputs from using up all my prompt tokens
* Review keywords for all Log Line objects, maybe cut back on some examples to save on GPT-3 tokens
* spend time in playground refining prompts. Look for list of prompts online for inspiration.
* apply prompt lessons to "full" prompt, then finetune orphanFull to look for problems in practice
* Set up KeyVault integration for OpenAI key and db connection strings, log GPT responses to a db. This db can be used to manually review and use to further finetune the model to improve future output. Bool cols for IsGenerated and IsGoodForFinetuning (default false for generated text, once I manually review I can choose to flip it)
PROMPT DESIGN
* GPT-3 prompt examples: https://beta.openai.com/examples
Summarize for a 2nd grader
** Micro horror story creator
** Essay outline
* Ensure that the prompt + completion doesn't exceed 2048 tokens, including the separator
* OpenAI advises not to use the format of Param1=Value1. Instead convert it to natural short sentences like Param1 is a Value1. Param2 is a Value2.
* It is important that every log line input param have some presence in the completion log line summary text to demonstrate a connection
* https://bmk.sh/2019/10/27/The-Difficulties-of-Text-Generation-with-Autoregressive-Language-Models/
** "One major problem with maximum-likelihood training of autoregressive models is exposure bias (Ranzato et al., 2015). Autoregressive models are only trained and evaluated on samples drawn from the target language distribution, but at evaluation time are fed samples that are themselves generated by the model. This error compounds extremely quickly and it has been observed, though admittedly anecdotally, that GPT-2 exhibits a sharp drop-off in quality after a certain number of steps."
** IDEA: does this mean I shouldn't directly feed completions back into the next prompt? Do some light manipulation first? Ideally, the author will introduce their own entropy into the system to modify each output before requesting the next completion
* https://www.gwern.net/GPT-3#quality
** For fiction, I treat it as a curation problem: how many samples do I have to read to get one worth showing off? [...] A Markov chain text generator trained on a small corpus represents a huge leap over randomness: instead of having to generate countless quadrillions of samples, one might only have to generate millions of samples to get a few coherent pages; this can be improved to hundreds or tens of thousands by increasing the depth of the n of its n-grams. […] But for GPT-3, once the prompt is dialed in, the ratio appears to have dropped to closer to 1:5—maybe even as low as 1:3!
* The OpenAI example for micro-horror: https://beta.openai.com/examples/default-micro-horror has hyper params Temperature=0.5 and Frequency Penalty=0.5
PROMPT IDEAS
* from the forums: "keep it simple, less words is better, and give it a very good thorough example - just one really good one should do for what you want to to"
* which leads me back to experimenting with the prompts in Playground before I attempt any finetuning
-------- (FROM OPENAI EXAMPLES)
Topic: Breakfast
Two-Sentence Horror Story: He always stops crying when I pour the milk on his cereal. I just have to remember not to let him see his face on the carton.
###
Topic: Wind
Two-Sentence Horror Story:
--------
* The above is a small version of what I'm after. Perhaps I should consider the style "Genre: scifi" instead of "The genre is Scifi"?
--------
Here’s a short story by Terry Pratchett.
Barry
By Terry Pratchett
Death looked at the man and said ‘HELLO.’
----------
Here is an award winning short story:
They Come From The Earth
By John Vickersonik
----------------
Here is an award winning short story:
----------------
Here is a short story:
* quality of output degrades when you remove "award winning"
---------------
The following is an author's summary of a story involving [log line description]. The author's summary is concise and only covers the very beginning of the story.
---------------
[summary here]
The following is how a skilled author would expand the above summary into more detailed story beats:
--------------
Here's a three-sentence summary of the plot so far:
--------------
Write a novel with the following description
Genre: Epic science fiction space opera
Style: Mythic, like Frank Herbert's Dune or Tolkien's Silmarillion
Premise: An object, the Obelisk, has been found in deep space on a route between the Milky Way and Andromeda galaxies. The object is a giant diamond in shape, but of unknown material and origin. This story follows several perspectives as they wrangle with the truth of the Obelisk. Religious orders claim it, as to scientific and governmental agencies.
The story so far: Beginning
--------------
* I like the hint words in the prompt above of "Premise" and "The story so far: Beginning"
--------------
Continue writing a novel based on the summary and last chunk.
Example 1:
Summary: <>
Last chunk: <>
Continuation: <>
Example 2:
Summary: <>
Last chunk: <>
Continuation: <>
--------------
The following is a summary of a novel so far. Read the summary and continue the story.
Summary:
<>
Last few lines:
<>
Write a long continuation of the above story:
--------------
Write a concise summary of the following excerpt:
<>
Concise summary:
--------------
* I like the hint word of "excerpt" and "concise"
--------------
[full summary here]
Rephrase this to be more dramatic and emotionally gripping.
--------------
* append emojis after every sentence to communicate emotion and other actions/nouns in that sentence. The emojis act a miniature summaries of each sentence to reinforce the underlying meaning of the words.
* use Plutchik's wheel of emotions along with emojis to label sentences. Are there distinct emojis for each level? Maybe pair with a qualifier word like: (mild 😒, intense 😠)
-------------
* use parenthesis to evoke internal monologue about the intent behind the output: (this is symbolic of jealousy)
* "Prompt: given everything that has happened to the main character, this is their internal monologue:"
------------
[summary]
The following is a sequence of movie scenes (story beats) of an award winning plotline that expands upon the summary above. Each story beat is connected to the other scene either overtly or through symbolism.
[full]
------------
* when generating the full prompt, search for all caps sequence tags, like THEME STATED:, and inject the Sequence-specific advice to convert it into something like "THEME STATED: (demonstrates the main question or lesson the main character will face)"
* edge case is when encountering "(CONTINUED)". Need special language to indicate that it builds off the previous original instance of that sequence type.
------------
* just for the orphanSummary, start the prompt with "Once upon a time"
------------
* to explore GPT-3's capabilities, what if I start at the very lowest level, and ask it things like "list a sequence of events that logically depend on each other"
* if the completion makes sense like a dependency graph, then guide it more emotion/story language
------------
Write a short summary of a story for kids/teens/adults about keyword1, keyword2, and keyword3 (NOTE: we need to inject the "and" at the end of the keyword list)
-----------
* https://beta.openai.com/docs/api-reference/completions/create#completions/create-logit_bias
* use to increase chances of user-entered keywords and logline words appearing. Could also add "hero name" to the UI, and crank up likelihood of that name appearing along with a prompt of "the main character's name is: John"
-----------
After reading the following sequence of events, write a summary of what happens next:
[full]
Now write a concise summary of what happens next.
<>
-----------
Express Rate Limit - OpenAI suggests a max of 6 requests per minute (per user?)
LIMITATIONS AND RESTRICTIONS
High-level guidelines... not requirements?
We generally do not permit tools that generate a paragraph or more of natural language or many lines of code, unless the output is of a very specific structure that couldn't be re-purposed for general blog or article generation (e.g., a cover letter, a recipe, song lyrics).
For more scoped use-cases, we tend to recommend an output of around 150 tokens (~6-8 sentences), but it will depend on the specific use-case.
For generative use-cases where the user has considerable control in directing the output, you should generally use the OpenAI Content Filter to prevent 'Unsafe' (CF=2) content.
Rate-limiting end-users’ access to your application is always recommended to prevent automated usage, and to control your costs; there will be more specific guidelines by use-case.
FINETUNING
* YouTube video on fine-tuning "Bugout dev" suggests 100-200 example rows is sufficient for initial fine-tuning for generative use-cases
* "Right now, you can fine-tune up to 10 models per month and each dataset can be up to 2.5M tokens or 80-100MB in size"
Out of the Bottle
Aladin
LiarLiar
Fantasia
Soul
Monster in the House
Whiplash
TheCraft
JurassicWorld
Golden Fleece
StarWarsEp4
TheWizardOfOz
TheMitchellsVsTheMachines
Finding Nemo
Up!
Superhero
IronMan
KungFuPanda
Ratatouille
Rites of Passage
HowToTrainYourDragon
Float
ACharlieBrownThanksgiving
Fool Triumphant
Elf
Moneyball
TheKingsSpeech
Buddy Love
MyOctopusTeacher
BeautyAndTheBeast
ET
Whydunnit
CaptainMarvel
TheBigLebowski
TheConversation
Zootopia
Institutionalized
Sicario
DrStrangelove
FreeSolo
Encanto
Unexpected Problem
TheLegoMovie
Taken
DontLookUp
HERO
Caregiver
ET
Finding Nemo
Creator
IronMan
HowToTrainYourDragon
TheMitchellsVsTheMachines
Soul
Explorer
Whiplash
MyOctopusTeacher
BeautyAndTheBeast
Ratatouille
FreeSolo
Innocent
Elf
Sicario
TheLegoMovie
TheCraft
TheWizardOfOz
KungFuPanda
TheBigLebowski
DrStrangelove
ACharlieBrownThanksgiving
Zootopia
Jester
LiarLiar
Lover
Encanto
Magician
Fantasia
Orphan
Aladin
Outlaw
CaptainMarvel
Taken
Moneyball
Ruler
JurassicWorld
TheKingsSpeech
Sage
DontLookUp
TheConversation
Up!
Warrior
StarWarsEp4
ENEMY
Caregiver
Float
LiarLiar
FreeSolo
Encanto
Creator
Explorer
Finding Nemo
Up!
Innocent
ET
Soul
Jester
Taken
DrStrangelove
TheKingsSpeech
Lover
TheBigLebowski
Magician
Aladin
TheCraft
TheWizardOfOz
Orphan
MyOctopusTeacher
TheConversation
Zootopia
Outlaw
StarWarsEp4
Sicario
JurassicWorld
Ruler
Whiplash
Elf
TheLegoMovie
Moneyball
BeautyAndTheBeast
DontLookUp
TheMitchellsVsTheMachines
Ratatouille
ACharlieBrownThanksgiving
Sage
IronMan
Fantasia
Warrior
HowToTrainYourDragon
CaptainMarvel
KungFuPanda
Exact Revenge
CaptainMarvel
Sicario
TheCraft
Find Connection
Aladin
Elf
Float
LiarLiar
BeautyAndTheBeast
TheMitchellsVsTheMachines
ACharlieBrownThanksgiving
ET
Up!
Soul
Protect Family
StarWarsEp4
HowToTrainYourDragon
MyOctopusTeacher
Taken
DontLookUp
Finding Nemo
Encanto
Protect Possession
IronMan
KungFuPanda
DrStrangelove
TheConversation
Survive
Whiplash
TheLegoMovie
TheWizardOfOz
Moneyball
TheBigLebowski
Fantasia
JurassicWorld
Ratatouille
TheKingsSpeech
FreeSolo
32854 words total after 30 stories
Related to object properties
Interest, curiosity, enthusiasm - Indifference, habituation, boredom
Attraction, desire, admiration - Aversion, disgust, revulsion
Surprise, amusement - Alarm, panic
Future appraisal
Hope, excitement - Fear, anxiety, dread
Event-related
Gratitude, thankfulness - Anger, rage
Joy, elation, triumph, jubilation - Sorrow, grief
Patience - Frustration, restlessness
Contentment - Discontentment, disappointment
Self-appraisal
Humility, modesty - Pride, arrogance
Social
Charity - Avarice, greed, miserliness, envy, jealousy
Sympathy - Cruelty
Negative and forceful
Anger
Annoyance
Contempt
Disgust
Irritation
Negative and not in control
Anxiety
Embarrassment
Fear
Helplessness
Powerlessness
Worry
Negative thoughts
Pride
Doubt
Envy
Frustration
Guilt
Shame
Negative and passive
Boredom
Despair
Disappointment
Hurt
Sadness
Agitation
Stress
Shock
Tension
Positive and lively
Amusement
Delight
Elation
Excitement
Happiness
Joy
Pleasure
Caring
Affection
Empathy
Friendliness
Love
Positive thoughts
Courage
Hope
Humility
Satisfaction
Trust
Quiet positive
Calmness
Contentment
Relaxation
Relief
Serenity
Reactive
Interest
Politeness
Surprise
Top level, clockwise-ish:
LEFT OFF: go through list, don't worry about synonyms for now, we can use the vectors to identify them later in a data-driven way. Once all emotions are added, return to labeling scene emotions. ONLY mark as DONE when
1. Joy
B4. adoration - DONE
B7. entrancement
E2. Satisfaction
E3. Courage
E6. Pleasure
E8. Amusement
E9. Delight
E10. Excitement
E11. Elation
E35. Pride
C1. curiosity
C2. enthusiasm
C13. triumph
C14. jubilation
C21. arrogance
P3. Cheerfulness
P35. Morbidness
P36. Derisiveness
P39. Victorious
P42. Bittersweetness
Caring
Easiness
Comfort
Confident
Intrigue
Insightful
Enlightenment
Epiphany
Thrilled
Pleased (Chuffed, with oneself, not the same as pleasure)
Satisfied
2. Love
L5. compassion
B8. craving
B9. sexual desire
B10. romance
B11. nostalgia
B12. empathic pain (Sympathy)
B13. satisfaction
E14. Affection
E15. Empathy
E16. Friendliness
E17. Calmness
E19. Contentment
E20. Relaxation
E21. Relief
E48. Politeness
C4. habituation
C5. Attraction
C17. Patience
C22. Charity
P6. Tolerance
Grasping
longing
Docile
Deference
Pity
3. Fear
B18. horror
E23. Helplessness
E24. Worry
E25. Anxiety
E29. Tension
E30. Stress
E36. Doubt
C8. Alarm
C9. Panic
C10. dread
P16. Expectancy
P30. Modesty
P45. Frozenness (fear+anger)
Hysteria
insecurity
4. Anger
E33. Irritation
E37. Frustration
E38. Envy (less negative than jealousy)
C24. greed
C25. miserliness
C26. Hate
P11. Hostility
P13. Fury
P24. Outrage
P38. Vengeance
P41. Dominance
Possessive
Demanding
5. Disgust
C3. Indifference
C6. Aversion
C7. revulsion (same as contempt?)
C18. restlessness
C19. Discontentment
C23. Avarice
C26. Cruelty
P7. Dislike
P31. Scorn
P32. Cynicism
P43. Ambivalence
Criticism
Distaste
Condescension
Discomfort
Ennui
6. Sadness
L13. depression
B23. awkwardness
E26. Embarrassment
E27. Powerlessness
L14. shame (Shame is a feeling that your whole self is wrong, and it may not be related to a specific behavior or event.)
L14a. guilt (Guilt is a feeling you get when you did something wrong, or perceived you did something wrong.)
E42. Hurt
E44. Despair
E45. Disappointment
P2. Gloominess
P4. Dejection
P10. Dismay
P22. Fatalism
P26. Misery
P27. Sullenness (like sadness + anger)
P28. Pessimism (sad + anticipation)
P33. Sentimentality
P34. Resignation
P40. Prudishness
Regret
Dispirited
Dissatisfied
Embarrassed
Self-conscious
7. Surprise
L16. gratitude
B24. confusion (surprise+anticipation)
B27. aesthetic appreciation
E1. Humility
E28. Shock
C11. thankfulness
C15. Sorrow
C20. modesty
P14. Attentiveness
P18. Astonishment
P23. Unbelief
Impressed
Inspiration
Puzzlement
IDEA: The nearest points via cosine are the ordered ranking. Will need way to ensure I don't have near duplicate vectors for different words, which is a way to identify synonyms.