zchen0420 commented 4 months ago

ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

2023 Workshop on Computational Approaches to Subjectivity, Sentiment “oxymoron” Despite being fun to interact with, ChatGPT doesn't have the ability to be humorous or comedic in the same way humans are.

古今ChatGPT笑话集（25个）

Jokes are not hard-coded but mostly also not newly generated by the model.（全是已有的笑话）
Over 90% of 1008 generated jokes were the same 25 Jokes.
- 笑话有限，其种类的出现并不平均。
The system accurately explains valid jokes but also comes up with fictional explanations for invalid jokes.
Joke-typical characteristics can mislead ChatGPT in the classification of jokes. （只知道自己说的那些幽默，对新笑话的解释牵强。）
ChatGPT has not solved computational humor yet but it can be a big leap toward “funny” machines.

生成、解释，然后研究笑话的视点（解构）

结构
主题personification
用词wordplay, pun 以此看ChatGPT的concept of humor，deliver a punch line

（没看完，只到笑话人工聚类。）

Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

2024 | Sea AI Lab, 中山大学, Harvard U

Oogiri game （大喜利）

Creative Leap-of-Thought (CLoT) paradigm not only excels in humor generation in the Oogiri game but also boosts creative abilities in various tasks.
CoT (w/o) vs. LoT (w/ tuning) 人思维跳脱还要看心情状态；依然是通过templated prompt来刺激模型输出，然后排名。
Explorative self-refinement with weakly-associated concepts helps generate diverse high-quality data for further refinement. Qwen.
激发模型的创造性需要数据，有创造力的模型能生成这些数据。（循环诘问）

zchen0420 commented 4 months ago

The Troubling Emergence of Hallucination in Large Language Models – An Extensive Definition, Quantification, and Prescriptive Remediations

EMNLP 2023 | 南卡U、印度孟加拉的大学、斯坦福和亚马逊 Hallucination是emergency的py-product，细分hallucination的人工数据集HallucInation eLiciTation（HILT），由15种现代LLMs生成的公开的包含75,000个文本片段的数据集，并附有人工注解。此外，引入了幻觉易感性指数（Hallucination Vulnerability Index, HVI），用于量化和比较LLMs生成幻觉的倾向性。

{Factual mirage, Silver lining} x {intrinsic, extrinsic}包含不同的种类。
验证15个当下的LLM。

主要方向：

事实幻影（Factual Mirage, FM）：当语言模型在处理客观正确的提示时产生的错误或歪曲信息。
希望之光（Silver Lining, SL）：在处理可能具有误导性但包含潜在真实信息的提示时，模型产生的幻觉。

源头区分：

内在幻觉（Intrinsic）：训练数据的问题，注入到了模型中。
外在幻觉（Extrinsic）：交互时导致的问题，对prompt等敏感。

6种类型：

首字母缩写歧义（Acronym Ambiguity）:
- 描述：当语言模型错误解释或生成基于首字母缩写的文本时发生。
- 例子：模型可能将“USA”错误地解释为“United Sports Association”而不是“United States of America”。
数字麻烦（Numeric Nuisance）:
- 描述：涉及数字的幻觉，尤其是当模型生成与事实不符的日期、时间或数量时。
- 例子：如果模型声称“第二次世界大战在1932年开始”，这明显是错误的，因为第二次世界大战实际上是在1939年开始。
生成的魔像（Generated Golem）:
- 描述：指模型创造出完全基于虚构信息的文本输出。
- 例子：模型可能会创造一个关于不存在的科技产品的故事，如“2025年，苹果公司发布了一款可以实现心灵感应的手机”。
虚拟声音（Virtual Voice）:
- 描述：当模型仿照某个不存在的人物或错误地归属某种声音时发生。
- 例子：模型可能错误地引用马丁·路德·金说过的话，而这话实际上是编造的。
地理错误（Geographic Erratum）:
- 描述：模型在处理地理信息时产生的错误。
- 例子：如果模型声称“巴黎是比利时的首都”，这显然是错误的，因为巴黎是法国的首都。
时间错位（Time Wrap）:
- 描述：模型在描述历史事件或时间线时的错误。
- 例子：模型可能会声称“手机在20世纪初被发明”，而实际上现代手机是在20世纪末发展起来的。

使用NYTimes和Politifact作为prompt，均匀的对各种类型输出进行人工判别和手动（改词、重写）以避免幻象。但是没有万能的解决方法。

对于较小的LLMs，如T5、Dolly等，很少观察到生成的魔像（Generated Golem）、虚拟声音（Virtual Voice）和地理错误（Geographic Erratum）类别的幻觉。
较大的、没有进行RLHF的LLMs倾向于产生两种类型的幻觉。（by-product）
根据定义，数字麻烦（Numeric Nuisance）和首字母缩写歧义（Acronym Ambiguity）是轻微的幻觉类别，随着LLM大小的增加，表现出减少的希望之光（SL）倾向；相反，像时间错位（Time Wrap）和地理错误（Geographic Erratum）这样的复杂类别变得更为普遍。值得注意的是，从GPT-3.5到GPT-4，虚拟声音（Virtual Voice）显著增加。

When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

EACL 2023 | 作者包含Dan Jurafsky, Kathleen McKeown, 和Tatsunori Hashimoto | Columbia U, Stanford U

模型在pre-training中学到的bias会造成下游任务的hallucination。 adaptation和新数据fine tuning能够缓解，但并不能改变bias的倾向种类。越是面向抽象任务的模型（相比抽出模型而言），越容易传播其bias。

intrinsic evaluations (任务无关、语言内在如PPL) vs. extrinsic evaluations (任务相关，如MT的BLEU)

贡献：out-of-distributiuon evaluation dataset （感觉st也是建立这样的extrinsic eval数据集了。）

Adapter-fine-tuning that fine-tune a smaller number of parameters generate fewer hallucinations than fine-tuning the entire model

这里的hallucination是在summarization时，输入条件和参数之间的选择矛盾。

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

ICLR 2024 | 北大这篇找到hallucination是LLM一个本质现象，能够从OoD/非人类常用语言空间中，诱发出来幻觉。（但另一方面，我觉得训练时的遗忘和归一化造成的，要是能把一般能力和外部知识分开就好了，但一方面界限可能并不那么容易划，另一方面，即使能划也可能影响能力？LLM的前期过程是完全被动的，没有自己的归纳整理）

方法（通过梯度逆向工程来寻找神奇prompt、不同程度的adversarial examples）：

预定一些hallucination句子/输出，然后通过梯度和替换输入中的词来来找到这个输出对应的两种输入：
1. Weak semantic attack
2. Out-of-Distribution (OoD) attack

观察

从最后结果看，通过OoD激发了hallucination
其实和Auto/RLPrompt一样，都说明LLM的向量空间和人的不太一样。至少从adversarial examples中看出不平整。

【需要进一步看懂梯度的方法】

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

zchen0420 commented 4 months ago

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

We’re Afraid Language Models Aren’t Modeling Ambiguity

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

Unveiling the Implicit Toxicity in Large Language Models

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Reformulating Domain Adaptation of Large Language Model as Adapt-Retrieve-Revise

2024 | ACL-Finding | Alibaba DAMO & Kyoto U (Wan)

各种系统杂糅，大力出奇迹，GPT-4来修改。除此以外，更多是领域以内的知识介绍。

zchen0420 commented 4 months ago

Leveraging Code to Improve In-context Learning for Semantic Parsing

zchen0420 commented 4 months ago

Understanding the Effect of Model Compression on Social Bias in Large Language Models

EMNLP 2023 | CMU, NOVA LINCS, Allen Institute for AI | Gustavo Gonçalves and Emma Strubell

Social biases in LLMs are an ongoing problem that is propagated from pretraining to finetuning. Existing work has shown that pruning disproportionately impacts classification accuracy on low-frequency categories in computer vision models (Hooker et al., 2021), but that pruning transformer models can have a beneficial effect with respect to bias when modeling multilingual text (Hooker et al., 2020; Ogueji et al., 2022). Further, Xu and Hu (2022) have shown that compressing pretrained models improves model fairness by working as a regularizer against toxicity.

Testbed: Bias Bench (Meade et al., 2022), a compilation of three social bias datasets.

CrowS-Pairs (Nangia et al., 2020), StereoSet (SS) (Nadeem et al., 2021), and SEAT (Kaneko and Bollegala, 2021).
A minimally distant sentence is defined as a small number of token swaps in a sentence, that carry different social bias interpretations; an optimal score is a ratio of 50%.

MLM: RoBERTa pretraining was done over 161 GB of text, which contained the 16GB used to train BERT, approximately a ten-fold increase. RoBERTa also trained for longer, with larger batch sizes which have shown to decrease the perplexity of the LLM (Liu et al., 2019). Autoregressive: The set of checkpoints released for the Pythia model family allows us to assess an even wider variety of model sizes, so that we can observe how bias varies throughout pretraining.

Knowledge distillation (Hinton et al., 2015). During training the student model minimizes the loss according to the predictions of the teacher model (soft-targets) and the true labels (hard-targets) to better generalize to unseen data.
Post-Training Quantization (PTQ). Full-precision floating point model weights are statically mapped to lower precisions after training, with activations dynamically mapped from high to low precision during inference.

蒸馏和量化（regularizer effect）都有一定的正面效果于social bias，同时LM能力也降低。模型越大、训练时间越长，越难通过此方法扭转模型的social bias；

The Kendall Tau C & one-sided t-test: a strong correlation between LM score and social bias.

感想

bias本身就是内藏在训练数据中的，这些间接的方法也理所当然会把两者同时拉高和降低。除了distillation和quantization，还有pruning、mixture of experts和adaptive computation。

zchen0420 commented 4 months ago

Primacy Effect of ChatGPT

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

zchen0420 commented 4 months ago

Syntax

Learning Syntax without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

zchen0420 commented 4 months ago

Self-Rewarding Language Models

阻碍RLHF的瓶颈是那个小型的离线reward model，当LLM有指令跟随能力之后，应该让他们自己卷自己。因为LLM已经能输出人类等级的高质量数据，也能鉴别数据中不好的输出。所以应该让他们自我放飞。

Instruction fine-tuning (IFT)
Evaluation fine-tuning (EFT) 利用CoT来获得更有力的鉴别能力，过滤

过程：

$M_0$ + IFT + EFT $\rightarrow$ $M_1$
$M_1$ + AIFT($M_1$) + DPO $\rightarrow$ $M_2$
$M_2$ + AIFT($M_2$) + DPO $\rightarrow$ $M_3$

（再看一遍实验中的任务）

KTO: Model Alignment as Prospect Theoretic Optimization

Kahneman & Tversky’s prospect theory: humans are famously loss-averse. Human-aware loss functions (HALOs)，如DPO，依赖preference，基于对比的对子

减少偏好输出的概率和增加不被偏好输出的概率之间的差距
KTO则利用二元信号（输出是可取的或不可取的）的反馈机制，通过调整模型对这些信号的响应，反映出损失厌恶的特征。

回顾RLHF，其步骤包含：

SFT：多种下游任务的监督学习，如对话、指令跟随、总结。
Preference sampling：模型输出对子，人来评判哪个更好；
- Reward learning：训练一个指导模型（从这一步开始，RLHF和DPO就不同了）
Reinforcement-learning optimization：用指导模型给出反馈，用PPO来调整模型。（RL不是很稳定，且复杂。P：近/离原始模型不愿；PO，优化policy。）

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

DPO回到最大似然，绕过的问题：显式reward估计；RL中的各种交互和角色。NIPS最佳论文，Ng首肯。从reward function (loss) 转换到 optimal policies (loss)，同样是代表human preference。同样有近端P的考虑。

（DPO原文里会提到r是为了和PPO做对比？）

在SFT、preference sampling之后，RLHF/PPO、DPO和KTO之间的关系

后面这些peer也是在以人类提供的某个目标做优化（HALO），他们有自己的目标函数。

online vs. offline：是否实时更新

ORPO: Monolithic Preference Optimization without Reference Model

KAIST Monolithic是“一整块”的意思，意味着统一了SFT和后面的alignment步骤。 Odds Ratio Preference Optimization (ORPO)的OR是在训练阶段加速alignment收敛的方法。大多PO过程都有两条分支： $(y{win}|x)$ 和 $(y{lose}|x)$ ，目的在于调整y（整个序列生成）的出现概率。同时不能让模型失去语言和指令跟随能力，因为softmax的影响范围是挺大的（但是SFT对减少不喜欢的句子用处不大，所以preference要包含负例）。

PR leads to more extreme discrimination of the disfavored responses than OR.

感想。。数据真神奇。用公式去捏模型。让输出达到自己期望的分布。（只能改变局域分布，无法获知整体：同等长度的句子集合都有自己的分布空间。）【疑问：reward在没有RL的语境下是什么呢？为啥这些PO都反而比探索的RL强？还是因为小PLM的局限吗？】

SimPO: Simple Preference Optimization with a Reference-Free Reward

zchen0420 commented 3 months ago

田中介绍的文

INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection

2024 ICLR I Alibaba Cloud, ZJU The motivation is intuitive: LLMs preserve the highly-concentrated semantic information of the entire sentence within their internal states (Azaria & Mitchell, 2023). 1) EigenScores of states for detection; 2) rectify abnormal activations of the internal states. Clip extreme features, which tends to prevent overconfident generations. 文章提到temperature, top-k/p对于重新回顾技术有帮助，Length Normalized Entropy（基于perplexity）用于sampling来估计多个句子的不确定性。（知之为知之，不知为不知，是知也）

zchen0420 / nn_papers

Humanlike behaviors #6

ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

The Troubling Emergence of Hallucination in Large Language Models – An Extensive Definition, Quantification, and Prescriptive Remediations

When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

We’re Afraid Language Models Aren’t Modeling Ambiguity

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

Unveiling the Implicit Toxicity in Large Language Models

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Reformulating Domain Adaptation of Large Language Model as Adapt-Retrieve-Revise

Leveraging Code to Improve In-context Learning for Semantic Parsing

Understanding the Effect of Model Compression on Social Bias in Large Language Models

感想

Primacy Effect of ChatGPT

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Syntax

Learning Syntax without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

Self-Rewarding Language Models

KTO: Model Alignment as Prospect Theoretic Optimization

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

ORPO: Monolithic Preference Optimization without Reference Model

SimPO: Simple Preference Optimization with a Reference-Free Reward

INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?

Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

In-Context Symbolic Regression: Leveraging Language Models for Function Discovery