w3c / webrtc-charter

Draft for an updated WebRTC Working Group charter
https://w3c.github.io/webrtc-charter/webrtc-charter.html
4 stars 17 forks source link

Representing Meetings' Transcripts and Minutes #84

Closed AdamSobieski closed 4 months ago

AdamSobieski commented 4 months ago

Introduction

Hello. I would like to propose that exploring representations of meetings' transcripts and minutes be in-scope for the WebRTC Working Group. This work item would involve designing a representation suitable for transcripts and minutes for both in-person and virtual (WebRTC-based) meetings. Meetings' transcripts and minutes could be produced by human or AI agents.

Interestingly, in addition to timed speech transcription content, meetings' transcripts and minutes could include attachments, timed hyperlinks, slideshow presentation events, and other metadata.

Please find some preliminary and rough-draft sketches below, showcasing these features, utilizing a WebVTT-metadata-inspired format with an extensible and in-progress JSON schema.

Speech Transcription

As envisioned, meetings' transcripts would be mostly comprised of timed transcribed speech content.

Participants' speech needn't be transcribed to plain text. Alternatives, in these regards, include SSML, HTML, and LaTeX.

00:00.000 --> 00:05.000
{
  "@type": "speech",
  "agent" : {
    "@type": "person",
    "fullName": "Alice Smith",
    "position": ["Mathematics Instructor"],
  }
  "data": [{
    "@type": "data",
    "mimeType": "text/latex",
    "content": "See, here, that the value is $x^{2}$."
  }]
}
00:00.000 --> 00:05.000
{
  "@type": "speech",
  "agent": {
    "@type": "person",
    "fullName" : "Alice Smith",
    "position" : ["Senator", "Co-chair", "Attendee"],
  }
  "data": [{
    "@type": "data",
    "mimeType" : "text/plain",
    "content": "Without objection, the presenter's slides are entered into the minutes."
  }]
}

Attachments

Files could be attached to meetings' minutes and transcripts.

00:05.000 --> 00:05.000
{
  "@type": "attachment",
  "agent": {
    "@type": "person",
    "fullName": "Charles Brown",
    "position": ["Secretary", "Attendee"],
  }
  "data": [{
    "@type: "link",
    "mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    "href": "files/panelist-presentation-1.pptx"
    "metadata": [{
      "@type": "metadata", 
      "property": "author",
      "value": [{
        "@type": "person",
        "fullName": "David Jackson"
      }]
    }]
  },
  {
    "@type: "link",
    "mimeType": "application/pdf",
    "href": "files/panelist-presentation-1.pdf"
    "metadata": [{
      "@type": "metadata", 
      "property": "author",
      "value": [{
        "@type": "person",
        "fullName": "David Jackson"
      }]
    }]
  }]
}

Timed Hyperlinks

Timed hyperlinks could be entered into minutes and viewed and navigated by meetings' audiences.

00:05.000 --> 00:35.000
{
  "@type": "hyperlink",
  "agent": {
    "@type": "person",
    "fullName": "Charles Brown",
    "position": ["Secretary", "Attendee"],
  }
  "data": [{
    "@type: "link",
    "mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
    "href": "files/panelist-presentation-1.pptx"
    "metadata": [{
      "@type": "metadata", 
      "property": "author",
      "value": [{
        "@type": "person",
        "fullName": "David Jackson"
      }]
    }]
  },
  {
    "@type: "link",
    "mimeType": "application/pdf",
    "href": "files/panelist-presentation-1.pdf"
    "metadata": [{
      "@type": "metadata", 
      "property": "author",
      "value": [{
        "@type": "person",
        "fullName": "David Jackson"
      }]
    }]
  }]
}

Slideshow Presentation Events

As presenters presented their slideshows during meetings, events could be generated as they advanced or changed slides. These events could be entered into meetings' transcripts and minutes. These events could be consumed by both audiences and multimodal AI systems (see below).

WebVTT-based timed thumbnails could be useful for providing images to slideshow presentations' slides. These images of presentations' slides could also be accompanied by hyperlinks to individual slides (e.g., "files/slideshow.pptx#3").

Other Metadata

Metadata about meetings (e.g., their secretaries, scribes, or transcribing agents, their lists of attendees, their venues, their enabling software tools, hints for these software tools such as terminological domains, pronunciation lexicons, or planned discussion topics) could also be placed into meetings' transcripts and minutes.

00:00.000 --> 00:00.000
{
  "@type" : "metadata",
  "agent": {
    "@type": "person",
    "fullName": "Charles Brown",
    "position": ["Secretary", "Attendee"],
  }
  "property": "transcribingAgent",
  "value": [{
    "@type": "person",
    "fullName": "Charles Brown",
  }]
}

Use Case: Artificial Intelligence

A new and important use case for meetings' transcripts and minutes is artificial-intelligence systems, e.g., multimodal large language models (MLLMs), consuming these data. AI systems will be able to answer questions about and engage in dialogues about meetings (see, for example: [1]).

[1] Golany, Lotem, Filippo Galgani, Maya Mamo, Nimrod Parasol, Omer Vandsburger, Nadav Bar, and Ido Dagan. "Efficient data generation for source-grounded information-seeking dialogs: A use case for meeting transcripts." (2024). [arXiv] [GitHub]

Conclusion

Thank you for considering this proposal for a work item for the WebRTC Working Group charter.

dontcallmedom commented 4 months ago

Hi @AdamSobieski - thanks for writing this up!

To set expectations, I don't think the WebRTC Working Group is likely the right target for this standardization, and before any W3C Working Group would consider taking this up, I expect we would want to see a proposal going through enough incubation to justify the effort.

Now, separately, I am personally aware that there has been discussions around a similar effort under the "vCon" moniker, see https://github.com/vcon-dev/vcon - you may want to check it out and see if the ideas there align with your own goals, and possibly evaluate if and when such a proposal would be ready for standardization (and if so, if W3C would be an appropriate place for it).

I'm going to close this issue since I don't believe there is a short term path to have this included in the current WebRTC Working Group charter