Create Annotation object from bazinga transcript

Hi pyannote team,

First of all, this suite of tools is amazing - kudos to the team for putting these awesome tools together and for continuing to improve them.

I am looking to use some of the tools within pyannote-audio, along with the bazinga dataset to generate embeddings for a particular speaker and then identify that same speaker within other audio files. (i.e. Generate an embedding from a clip where only Sheldon is speaking and then find segments within other audio files that are likely to also be Sheldon).

My question is this: the bazinga dataset contains a "transcript" of each audio file with details about the speaker, start and end times, etc, for each word spoken. How can I convert this word by word transcript, which is structured as a list of dictionaries (one dictionary per word) into an Annotation object that is summarized at the speaker level?

For example, for "TheBigBangTheory.Season01.Episode01", the first few items in the transcript list look like this (some items replaced with ... below, for brevity):

[{'token': 'So',
  'speaker': 'sheldon_cooper',
  'forced_alignment': {'start_time': 1.4900000095367432,
   'end_time': 1.6100000143051147,
   'confidence': 0.9900000095367432},
  'addressee': 'leonard_hofstadter',
  'named_entity': None,
  'entity_linking': None},
 {'token': 'if',
  'speaker': 'sheldon_cooper',
  'forced_alignment': {'start_time': 1.659999966621399,
   'end_time': 1.7300000190734863,
   'confidence': 0.9900000095367432},
  'addressee': 'leonard_hofstadter',
  'named_entity': None,
  'entity_linking': None},
 {'token': 'a',
  'speaker': 'sheldon_cooper',
  'forced_alignment': {'start_time': 1.7400000095367432,
   'end_time': 1.7899999618530273,
   'confidence': 0.9900000095367432},
  'addressee': 'leonard_hofstadter',
  'named_entity': None,
  'entity_linking': None},
 {'token': 'photon',
  'speaker': 'sheldon_cooper',
  'forced_alignment': {'start_time': 1.7999999523162842,
   'end_time': 2.190000057220459,
   'confidence': 0.9900000095367432},
  'addressee': 'leonard_hofstadter',
  'named_entity': None,
  'entity_linking': None},
...
 {'token': 'slits',
  'speaker': 'sheldon_cooper',
  'forced_alignment': {'start_time': 12.119999885559082,
   'end_time': 12.609999656677246,
   'confidence': 0.9900000095367432},
  'addressee': 'leonard_hofstadter',
  'named_entity': None,
  'entity_linking': None},
 {'token': '.',
  'speaker': 'sheldon_cooper',
  'forced_alignment': {'start_time': 12.609999656677246,
   'end_time': 12.609999656677246,
   'confidence': 0.949999988079071},
  'addressee': 'leonard_hofstadter',
  'named_entity': None,
  'entity_linking': None},
 {'token': 'Agreed',
  'speaker': 'leonard_hofstadter',
  'forced_alignment': {'start_time': 13.0,
   'end_time': 13.34000015258789,
   'confidence': 0.9900000095367432},
  'addressee': 'sheldon_cooper',
  'named_entity': None,
  'entity_linking': None},
 {'token': ',',
  'speaker': 'leonard_hofstadter',
  'forced_alignment': {'start_time': 13.34000015258789,
   'end_time': 13.34000015258789,
   'confidence': 0.10000000149011612},
  'addressee': 'sheldon_cooper',
  'named_entity': None,
  'entity_linking': None},
 ...
 {'token': 'point',
  'speaker': 'leonard_hofstadter',
  'forced_alignment': {'start_time': 14.390000343322754,
   'end_time': 14.710000038146973,
   'confidence': 0.9900000095367432},
  'addressee': 'sheldon_cooper',
  'named_entity': None,
  'entity_linking': None},
 {'token': '?',
  'speaker': 'leonard_hofstadter',
  'forced_alignment': {'start_time': 14.710000038146973,
   'end_time': 14.710000038146973,
   'confidence': 0.949999988079071},
  'addressee': 'sheldon_cooper',
  'named_entity': None,
  'entity_linking': None},

How would we convert that to an Annotation object, let's call it bazinga_annotation, such that the output of bazinga_annotation.for_json()["content"] would look something like this?

[{'segment': {'start': 1.4900000095367432,
    'end': 12.609999656677246},
   'track': 0,
   'label': 'sheldon_cooper'},
  {'segment': {'start': 13.0, 
   'end': 14.710000038146973},
   'track': 1,
   'label': 'leonard_hofstadter'},

pyannote / pyannote-audio

Create Annotation object from bazinga transcript #1262