ngtban / wavenet_de_data_prep

GNU Affero General Public License v3.0
0 stars 0 forks source link

Build a database of the audio clips for later packaging into the Kaldi data structure used to train the vocoder #2

Closed ngtban closed 2 years ago

ngtban commented 3 years ago

I need to build a database of the audio clips and figure out the speaker of each clip.

Generally the name of each asset follows this format:

alternative-(Alternative Number)-(Character Name/Skill/Object under Examination)-(Location of Conversation) (Point of Focus) -- (Alternative Marking)-(Conversation Node)

The "alternative" prefix, alternative marking, location of conversation are optional.

ngtban commented 3 years ago

Some observations about the dataset:

ngtban commented 3 years ago

More observations after extracting data from the dialogue bundle

Dialogue entries:

Actors ids

The audio clips themselves

ngtban commented 3 years ago

Audio clips whose corresponding dialogue entries have no dialogue text

There are around 640 dialogue entries that should have a corresponding dialogue text, as they each have an audio file associated with them. To get the transcription for those entries, I would need to navigate the conversation graph and follow the conditions checking logic for each node, which I think is rather complicated and not worth the time. I believe the rest of the audio clips are enough to train the vocoder.

ngtban commented 3 years ago

I'm seeing cases where audio clips are marked by incorrect actors. One example is the audio clip named Kim Kitsuragi-YARD TRASH-390. Node 390 in the conversation has its actor id being 215 (which is the Trash Container), but the actual speaker is Kim!

ngtban commented 2 years ago

Looks like the writers forgot to add matching quotes in some dialogue entries. An example being the de with conversation id 23, dialogue entry id 662.

ngtban commented 2 years ago

I found where the joke endings are stored: they are all within a conversation with id 1427. None of the dialogue entries have text, the content is stored within scripts instead. Might need a mini parser.