piternicolaas / Text-Mining-Project

0 stars 0 forks source link

Authors

Seb Olsen, Piter Nicolaas, Caspar de Jong

Title

Creating spoiler-free summaries of sitcom television screenplays

Abstract

A max 150-word description of the project question or idea, goals, dataset used. What story you would like to tell and why? What's the motivation behind your project?

This project aims to create spoiler-free summaries of TV show episodes using scripts from the sitcom Friends. By developing a model that identifies and omits the climax and resolution stages of the episodes, we seek to generate concise and engaging episode previews without revealing key plot points. This approach preserves the viewing experience, providing a glimpse into episodes without spoiling their outcomes.

Research questions

A list of research questions you would like to address during the project.

Dataset

List the dataset(s) you want to use, and some ideas on how do you expect to get, manage, process and enrich it/them. Show you've read the docs and are familiar with some examples, and you've a clear idea on what to expect. Discuss data size and format if relevant.

A tentative list of milestones for the project

Add here a sketch of your planning for the coming weeks. Please mention who does what.

Documentation

data: contains raw script data

data_processed: contains preprocessed script data

results: contains model runs

climax_analysis.py: integrates calls to climax analysis model + summarisation model with preprocessing

crop_script.py: contains climax analysis model

imdb_scraper.py: contains imdb webscraper

main.py: integrates preprocessing and model training

model_predictions.py: uses pretrained model to predict descriptions from script data

model_training.py: fine-tunes T5-small model

preprocessing.py: preprocesses data