zchen0420 commented 6 months ago

SARI & D-SARI

JADOS

zchen0420 commented 6 months ago

In-context Learning of Large Language Models for Controlled Dialogue Summarization: A Holistic Benchmark and Empirical Analysis

2023 New Frontiers in Summarization Workshop, ACL | Yuting Tang et al. | NTU CNRS@CREATE IIR

Abstractive dialogue summarization (vs. document summarization) Controlled: imposing additional constraints on outputs. Assess: entity control, length control, and person-focused planning, as well as uncontrolled settings) SAMSum: a human-annotated dataset for abstractive multi-turn dialogue summarization 大量使用了GPT-3 as evaluator。并且也提及了随机选择demo对于精度的分散程度影响。

Evaluating Large Language Models on Controlled Generation Tasks

| Jiao Sun et al.,| USC UC

从一个方面去evaluate了当时的LLM： Numerical Planing：对于字数的控制不够精确，补全生成结果有偏短的趋势。 Content Controlling：gap is significantly reduced when ICL is used；GPT的zs一骑绝尘； Story：undesired repetitions, unnatural topic drifts；GPT还是一骑绝尘。虽然分数不代表主要观感。 Rationale：acc(I+R→O) - acc(I→O)，leakage｜background knowledge，

zchen0420 commented 6 months ago

Controlling Output Length in Neural Encoder-Decoders

2016 EMNLP | Yuta Kikuchi, Graham Neubig, et al. | TIT, CMU

Seq2seq (LSTM) decoding策略：1、禁止EOS、到点停车的beam；2、淘汰不符合要求的长度的beam； learning策略decoder：1、剩余字符plan实时暗示；2、一开始对状态说明长度；模型有控制长度的能力，外侧的观察。但是RNN内部状态机制未知。

Length Control in Abstractive Summarization by Pretraining Information Selection

2022 ACL | SJTU

Seq2seq (Transformer) 这里不仅考虑到剩余的字符数，也考虑到剩余的字符数对Cross Attention的动态调整（如果已经被attend过了，就减少其权重）。并且通过sample原数据集，平衡长度分布，行程LBD来平衡/tune模型能力。

zchen0420 commented 6 months ago

Entity-Based Evaluation of Political Bias in Automatic Summarization

2023 EMNLP 把报道内容的主角替换掉(entity replacement)后，summarization会变得更很奇怪。人与文高都耦合。 Summarization models are not neutral with respect to political entities. Entity-centric news articles, those that heavily feature the original entity, lead to more dissimilar summaries upon replacement. PreSumm: BERT + extractive PEGASUS, BART, ProphetNet: abstractive

zchen0420 commented 5 months ago

The 4th New Frontiers in Summarization Workshop

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

task-specific and aspect-specific prompts; PLM evaluators; For the meta-evaluation datasets depending on the reference (thus biased, e.g., pyramid method), the ChatGPT evaluator might lose its effectiveness. (as evaluator or as human; simple similarity-based metrics的问题在metrics，应该不在reference上吧)
n-gram, embedding, LLM, etc. | reference-[based|free] NLG, NLU

Zero-Shot Cross-Lingual Summarization via Large Language Models

composite: summarization + translation
ChatGPT & GPT-4 (~fine-tuned BERT > other mLLMs)
- Originally prefer to produce lengthy summaries with detailed information
- Further balance informativeness and conciseness with the help of an interactive prompt
结果：CoT（ST先总结后翻译效果更好），GPT评价也有效

SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism

Extract, Select and Rewrite: A Modular Sentence Summarization Method

抽取关系元组（知识、结构、图）、选取、合成自然语言。比BART更加faithful。

zchen0420 / nn_papers

Document-level text summarization #9

SARI & D-SARI

JADOS

In-context Learning of Large Language Models for Controlled Dialogue Summarization: A Holistic Benchmark and Empirical Analysis

Evaluating Large Language Models on Controlled Generation Tasks

Controlling Output Length in Neural Encoder-Decoders

Length Control in Abstractive Summarization by Pretraining Information Selection

Entity-Based Evaluation of Political Bias in Automatic Summarization

The 4th New Frontiers in Summarization Workshop

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Zero-Shot Cross-Lingual Summarization via Large Language Models

SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism

Extract, Select and Rewrite: A Modular Sentence Summarization Method

In-context Learning of Large Language Models for Controlled Dialogue Summarization: A Holistic Benchmark and Empirical Analysis