mytakedotorg / mtdo

The code and tooling which runs mytake.org
https://mytake.org
GNU Affero General Public License v3.0
16 stars 3 forks source link

How to help with transcribing #240

Open nedtwigg opened 4 years ago

nedtwigg commented 4 years ago

This video shows how to do it. It's 14 minutes long, but the first 4 minutes just pitch MyTake.org in general - you can skip to 4:00 and just watch the last 10 minutes if all you care about is the technical how-to.

If you want to do this work, you'll pick a video, crunch on it for 1-4 hours, and then you'll be done! You'll learn a lot about what the hot topics were at that point in time, and you'll learn even more about the ambiguities in human language - how we naturally hear pronouns and filler words even if they aren't actually enunciated, and lots of other interesting quirks. It's fun!

Here's how to do it:

If you get that far, then you will be able to see which videos are ready to be transcribed. If you want to know how long it will take, it's pretty conservative to just double the running time - e.g. a 1 hr debate will take ~2hrs to fixup the transcript. Call Ned before you get too invested, and he will make sure you're not working on the same one as someone else (by updating the "Dibs" column in this spreadsheet).

The GUI is saving as you go, so you don't have to worry much about losing your work. If you already know git, feel free to make commits as you go and push up a PR. If you don't know git, you don't have to learn it, just do this:

And that's all there is to it! Three major caveats:

nedtwigg commented 4 years ago

As of Mar 31 2020, this is broken on Windows (Mac & Linux only) - will have a fix soon!

nedtwigg commented 4 years ago

Fixed for windows users as of Apr 2 2020. If you were having problems, run git pull and that will install the fix.

nedtwigg commented 4 years ago

@ftwigg is our first volunteer, and he raised a great point which is worth documenting here.

People naturally pause and um, ah, er, etc. The YouTube transcript usually does not have these ums, and the Newspaper often does, but not always. Our goal is for the transcript to match what was spoken as exactly as possible. That means that we want to keep the ums.

There is also the matter of word- word and word - word. The hyphen-pause should have a space on each side word - word, not word- word. It is also okay to use commas. I need, uh, a small pause. But this idea - a big idea - it needs a big pause. It's your call!

Minor caveat: You may find that you hear an "um" which is in neither the Newspaper nor the YouTube. Most such "ums" will probably not be found - the tooling doesn't help at all, so it's just luck if you find one. Ideally we would get them all, but for now it's okay if we miss them, so don't worry about it.