Open nedtwigg opened 4 years ago
As of Mar 31 2020, this is broken on Windows (Mac & Linux only) - will have a fix soon!
Fixed for windows users as of Apr 2 2020. If you were having problems, run git pull
and that will install the fix.
@ftwigg is our first volunteer, and he raised a great point which is worth documenting here.
People naturally pause and um
, ah
, er
, etc. The YouTube transcript usually does not have these um
s, and the Newspaper often does, but not always. Our goal is for the transcript to match what was spoken as exactly as possible. That means that we want to keep the um
s.
There is also the matter of word- word
and word - word
. The hyphen-pause should have a space on each side word - word
, not word- word
. It is also okay to use commas. I need, uh, a small pause. But this idea - a big idea - it needs a big pause. It's your call!
Minor caveat: You may find that you hear an "um" which is in neither the Newspaper nor the YouTube. Most such "ums" will probably not be found - the tooling doesn't help at all, so it's just luck if you find one. Ideally we would get them all, but for now it's okay if we miss them, so don't worry about it.
This video shows how to do it. It's 14 minutes long, but the first 4 minutes just pitch MyTake.org in general - you can skip to 4:00 and just watch the last 10 minutes if all you care about is the technical how-to.
If you want to do this work, you'll pick a video, crunch on it for 1-4 hours, and then you'll be done! You'll learn a lot about what the hot topics were at that point in time, and you'll learn even more about the ambiguities in human language - how we naturally hear pronouns and filler words even if they aren't actually enunciated, and lots of other interesting quirks. It's fun!
Here's how to do it:
git clone https://github.com/mytakedotorg/mytakedotorg
mytakedotorg
directory, which you cancd
into./gradlew transcriptGui
(on windows, leave off the./
)If you get that far, then you will be able to see which videos are ready to be transcribed. If you want to know how long it will take, it's pretty conservative to just double the running time - e.g. a 1 hr debate will take ~2hrs to fixup the transcript. Call Ned before you get too invested, and he will make sure you're not working on the same one as someone else (by updating the "Dibs" column in this spreadsheet).
The GUI is saving as you go, so you don't have to worry much about losing your work. If you already know git, feel free to make commits as you go and push up a PR. If you don't know git, you don't have to learn it, just do this:
git status
, it will show you the file you have been changingAnd that's all there is to it! Three major caveats:
50 percent
, not50%
950 dollars
not$950
one trillion
not1,000,000,000,000