headchem / StoryGhostPlotter

0 stars 0 forks source link

add full screenplay data #98

Closed headchem closed 2 years ago

headchem commented 2 years ago

Add more full screenplays and all levels of summaries from very different genres. With the new admin-only reverse summarization models speeding up data entry, I can now focus my time on converting screenplays to the Fountain format, then complete all summary levels more quickly.

Maybe this ticket should be used to for a Colab notebook that does OCR on screenplay PDFs, removing page numbers, converting bold/italic to markdown, etc?

headchem commented 2 years ago

Colab notebook for OCR and basic line break cleanup complete: https://colab.research.google.com/drive/1VhCx7kgj8j5Nraww6TFmJ1sdFXVszDuE?usp=sharing