Label audio clips: speaker, transcription, etc.

See #2

[x] Handle data modelling of speakers. Should I just save a name in a column, or do they deserve a table?
[x] Add script to build the speakers table.
[x] Add a script to extract the transcription from the dialogue texts of dialogue entries.
[x] Handle transcription for thoughts
[x] Handle transcription for joke endings
[x] Add a script to check the correctness of the extracted transcriptions

Checklist for reviewing the extracted and labelled data:

[x] 1) Transcription of characters other than Harry should not be wrapped in quotes and should not have additional descriptions of how they said their lines (the text in-between quotes).
[ ] All audio clips left without transcription either 2) ~~corresponds to conversation nodes that require some conditions to be fulfilled and have a precessor node in the path that contains the transcription, or~~
[ ] 3) ~~those conversations no longer exist within the extracted data~~
[x] 4) audio clips whose corresponding dialogue entry that have non-empty dialogue text should have non-empty transcription

I need 2, 3, and 4 to be fulfilled as I plan to simply ignore audio clips without transcription in the task used for generating data for ESP. If one of the criterion is incorrect then ignoring audio clips without transcription means missing valid data.

I realized that writing a script for 2 and 3 is not necessary, as ensuring 4 is enough for the current implementation. I will need to add a script for 3 if I do not ignore audio clips without a corresponding dialogue entry like right now. For 2 I will need to actually implement graph navigation based on the code fragments in dialogue entries, moreover currently all such dialogue entries do not have any text.

ngtban / wavenet_de_data_prep

Label audio clips: speaker, transcription, etc. #4