Seb Olsen, Piter Nicolaas, Caspar de Jong
Creating spoiler-free summaries of sitcom television screenplays
A max 150-word description of the project question or idea, goals, dataset used. What story you would like to tell and why? What's the motivation behind your project?
This project aims to create spoiler-free summaries of TV show episodes using scripts from the sitcom Friends. By developing a model that identifies and omits the climax and resolution stages of the episodes, we seek to generate concise and engaging episode previews without revealing key plot points. This approach preserves the viewing experience, providing a glimpse into episodes without spoiling their outcomes.
A list of research questions you would like to address during the project.
List the dataset(s) you want to use, and some ideas on how do you expect to get, manage, process and enrich it/them. Show you've read the docs and are familiar with some examples, and you've a clear idea on what to expect. Discuss data size and format if relevant.
Add here a sketch of your planning for the coming weeks. Please mention who does what.
data
: contains raw script data
data_processed
: contains preprocessed script data
results
: contains model runs
climax_analysis.py
: integrates calls to climax analysis model + summarisation model with preprocessing
crop_script.py
: contains climax analysis model
imdb_scraper.py
: contains imdb webscraper
main.py
: integrates preprocessing and model training
model_predictions.py
: uses pretrained model to predict descriptions from script data
model_training.py
: fine-tunes T5-small model
preprocessing.py
: preprocesses data