mytakedotorg / mtdo

The code and tooling which runs mytake.org
https://mytake.org
GNU Affero General Public License v3.0
16 stars 3 forks source link

Sync transcripts for all videos #178

Closed nedtwigg closed 4 years ago

nedtwigg commented 6 years ago

We'll do these on feature/syncTranscripts. If it's helpful to merge this into the branch currently under active development, it's always easy to do so.

When you start transcribing a video, check its box to make sure that nobody else starts duplicating your work. For info on how to do this, see foundation-gen/TRANSCRIPT_SYNC_HOW_TO.md.

1960 - Kennedy, Nixon

1976 - Carter, Ford

1980 - Anderson, Carter, Reagan

1984 - Mondale, Reagan

1988 - Bush, Dukakis

1992 - Bush, Clinton, Perot

1996 - Clinton, Dole

2000 - Bush, Gore

2004 - Bush, Kerry

2008 - McCain, Obama

2012 - Obama, Romney

2016 - Clinton, Trump

WebsByTodd commented 6 years ago

I started on Obama-Romney 1 and ran into a few issues:

WebsByTodd commented 6 years ago

More notes:

nedtwigg commented 6 years ago

I fixed the exceptions you found. Regarding punctuation:

Words should always be separated by spaces, and the only punctuation which is allowed to not touch a word is -.

I added Spotless rules to enforce this, and updated for the transcripts that were already in there. We can use it to make sure all the transcripts are consistent.

WebsByTodd commented 6 years ago

Notes for Obama/Romney 1

WebsByTodd commented 6 years ago

I noticed a trend toward middle initials and I vaguely remember a conversation. Is there a guideline to follow here? What about for folks like Bill Clinton (William J Clinton) and Bob Dole (Robert J Dole) and Mitt Romney (Willard M Romney) whose full names might be less recognizable than the names they are known by?

--edit - I've updated NAME_AND_TITLE_DESIGN.md with this info.

nedtwigg commented 4 years ago

The checkmarks above were all formally synced. In addition, the following had been synced at one time (which means the .said is very reliable), but they need to be-synced against their new sources. It is possible there are problems in the .vtt for these new sources:

And for all of the rest, the .said was taken straight from debates.org, and autosynced by SetVttToSaid.java in #269. In the future, we will need to revisit these.