Meeting Summarization Use Case

manisnesan commented 7 months ago

          [From rasbt post](https://x.com/rasbt/status/1754516687896887449?s=46&t=aOEVGBVv9ICQLUYL4fQHlQ) - Flan T5 is a great go to model for text classification.

Tiny titans - Can smaller LLM models punch above their weight for meeting summarization

Originally posted by @manisnesan in https://github.com/manisnesan/fastchai/issues/47#issuecomment-1928762586

Questions

What are the datasets available
What are the key constituents involved in an effective meeting summary
What are the challenges involved in creating effective meeting summarization
What are the most recent advancements in meeting summarization tech
How is this different from other diverse summarization involved in news, science, technology, medical
How meeting summarization is related customer service call summarization?
- meeting (multi party interactions - more than two speakers) where as customer service call or medical appoints l is a biparty interaction ie only two speakers are involved.

manisnesan commented 7 months ago

Meeting Summarization

Meeting summarization is the process of creating a concise overview of the key points, decisions, and action items discussed during a meeting[1]. It serves to keep stakeholders informed, facilitate decision-making, encourage accountability, and enhance communication[1].

There are several proven ways to summarize a meeting effectively:

Take concise notes during the meeting, focusing on the most important information[1].
Use a clear and organized format in the summary, such as including the date, time, location, attendees, agenda items, discussion points, decisions, action items, and next steps[1].
Follow and fill out the meeting agenda when creating the summary notes[1].
Summarize the meeting over email to all participants after the fact[1].
Use AI tools to automatically generate meeting summaries from transcripts[1][2].

Challenges in meeting summarization include the difficulty of collecting confidential meeting data, the labor-intensive process of annotating summaries, and the need to capture key issues while excluding irrelevant discussions[4][5]. Recent research has focused on creating benchmark datasets[3][4][5] and developing advanced summarization models[2][3].

In summary, meeting summarization is a crucial skill for keeping teams aligned and productive, with various manual and automated techniques available to create high-quality summaries efficiently.

Citations: [1] https://fireflies.ai/blog/summarize-a-meeting [2] https://github.com/topics/meeting-summarization [3] https://paperswithcode.com/task/meeting-summarization [4] https://arxiv.org/abs/2305.17529 [5] https://aclanthology.org/2023.acl-long.906.pdf

manisnesan commented 7 months ago

Diverse Summarization Dataset

From Pegasus - Paper

news_email_bills_science_tech

manisnesan commented 7 months ago

From Abstractive Meeting Summarization

A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls.

Customer Service Calls could be multi-party conversation but only two party speak in a given time span. Also the format of the meeting in customer service is problem solving in nature.

Eg: Customer Rep - Agent 1 ---> Customer Rep - Agent 2 ----> Customer Rep -- Agent 3

Related: Abstractive Dialogue summarization, Abstractive Text Summarization, Meeting Summariziation, text Generation

Stages in abstractive

Selection of important points that are worthy enough. This is same as extractive summarization.
Synthesis
language generation
Figure 1 shows excerpts of the human-made extractive (left column) and abstractive (right col- umn) summaries of meeting ES2011c.
The col- ored lines relate each abstractive sentence to the set of extractive sentences—the abstractive com- munity—that annotators judged as supporting it.

manisnesan commented 7 months ago

Differences from traditional summarization

linguistic interactions involved in the meetings
multiparty conversations

manisnesan commented 7 months ago

From Call Summarization: why it is important and what it is possible today and in a near future

It aims to automatically generate concise, fluent summaries capturing the key points of a conversation, which can help improve customer experience and reduce agent workload

"AUTOMATIC SUMMARIZATION OF CALL-CENTER CONVERSATION" by E. Stepanov, B. Favre, F. Alam, S. Chowdhury, K. Singla, J. Trione, F. Be ́chet, G. Riccardi. offers a hybrid approach using both extractive/abstractive.

See

"State of the Art Summarisation Techniques", Information Systems Seminar 19/20 by Anna Franziska Bothe, Alex Truesdale, LukasKolbe
https://humboldtwi.github.io/blog/research/information_systems_1920/nlp_text_summarization_techniques/

manisnesan commented 7 months ago

From Generating Abstractive Summaries from Meeting Transcripts

manisnesan commented 7 months ago

Challenges involved

Nature of meeting-style speech :

leads to low information density & high noise
significantly longer eg: AMI transcript tokens 4, 757 & its summary 322
constrasting to two speaker conversations - multiparty conversations has challenges to speaker & addressee identification

Preference for abstractive summarization

LEAD-3 baseline - extractive methods - first 3 sentences of a doc
selection of important material

Heterogeneous meeting formats

sharing info or brainstorming ideas
depending on meeting formats require a variety of automatic systems

Subjectivity

reformulating the same content in different words & style
what is counted as summary-worth

manisnesan commented 6 months ago

See the example case study from Orca paper on Meeting Transcript processing

Example from the paper

System

You are a teacher. Given a task, you explain in simple steps what the task is asking, any guidelines it provides, and how to use those guidelines it provides to find the answer.

User

You will read a meeting transcript, then extract the relevant segments to answer the following question

Question: How does Steven feel about selling?

$Meeting_Transcript

Please answer the following question Question: How does Steven feel about selling?

Extract from transcript the most relevant segments for the answer, then answer the question.

manisnesan commented 6 months ago

https://www.reddit.com/r/LocalLLaMA/s/xeSFTXwa5q

manisnesan commented 6 months ago

https://community.openai.com/t/how-to-summarize-large-research-articles/142730

manisnesan commented 6 months ago

Five levels of summarizing Youtube

langchain map reduce is an interesting idea showcased
Topic modelling using language models is also another interesting approach here

Usecase

YouTube Videos - Auto Chapter Generation Podcasts - Extract structured information Meeting Notes - Send topic summaries to participants Town Hall Meetings - Structured information Earnings Report Calls - Sell structured data to investment groups Legal Documents - Quickly summarize by topic Movie Scripts - Quick bullet points for production recaps Books - Auto generate table of contents

manisnesan commented 4 days ago

PYDATA - NYC 2024 The Art of Compression: Crafting Insightful Summaries with LLMs

As Large Language Models continue to advance, their application in text summarization presents both powerful opportunities and specific challenges. This talk will focus on practical strategies to overcome the limitations posed by context windows—a critical factor when dealing with extensive texts. The talk will also demonstrate how fine-tuning can improve summarization tasks for domain specific private datasets and when to use what. Attendees will learn how to build an end-to-end summarization workflow, with a focus on effective data chunking, prompt optimization, and advanced evaluation methods to ensure accurate and meaningful summaries. The session will cover three key summarization techniques—stuff, refine, and map-reduce—explaining when and how to use each approach. In addition, we’ll explore the latest in evaluation metrics, demonstrating how to leverage more sophisticated models as judges to refine and assess the quality of summaries.

Outline:

Introduction to the summarization problem
Understanding LLM context windows
Chunking techniques and their applications
Overview of summarization techniques (stuff, refine, map-reduce)
Fine-tuning on a custom dataset
Introduction to Langchain and its role in summarization
Evaluation metrics
Using LLMs as judges
LLM production deployment

Background Knowledge Required:

Basic understanding of LLMs
Familiarity with Python

Presentation - https://github.com/aartij22/Pydata-NYC-2024

manisnesan / fastchai

Meeting Summarization Use Case #76

Questions

Meeting Summarization

Diverse Summarization Dataset

Challenges involved

System

User