zchen0420 / nn_papers

To record my paper reading in my native language, mimicking ooooohira-san.
0 stars 0 forks source link

Humanlike behaviors #6

Open zchen0420 opened 4 months ago

zchen0420 commented 4 months ago

ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

2023 Workshop on Computational Approaches to Subjectivity, Sentiment “oxymoron” Despite being fun to interact with, ChatGPT doesn't have the ability to be humorous or comedic in the same way humans are.

古今ChatGPT笑话集(25个)

生成、解释,然后 研究笑话的视点(解构)

(没看完,只到笑话人工聚类。)

Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

2024 | Sea AI Lab, 中山大学, Harvard U

Oogiri game (大喜利)

zchen0420 commented 4 months ago

The Troubling Emergence of Hallucination in Large Language Models – An Extensive Definition, Quantification, and Prescriptive Remediations

EMNLP 2023 | 南卡U、印度孟加拉的大学、斯坦福和亚马逊 Hallucination是emergency的py-product,细分hallucination的人工数据集HallucInation eLiciTation(HILT),由15种现代LLMs生成的公开的包含75,000个文本片段的数据集,并附有人工注解。此外,引入了幻觉易感性指数(Hallucination Vulnerability Index, HVI),用于量化和比较LLMs生成幻觉的倾向性。

主要方向:

  1. 事实幻影(Factual Mirage, FM):当语言模型在处理客观正确的提示时产生的错误或歪曲信息。
  2. 希望之光(Silver Lining, SL):在处理可能具有误导性但包含潜在真实信息的提示时,模型产生的幻觉。

源头区分:

  1. 内在幻觉(Intrinsic):训练数据的问题,注入到了模型中。
  2. 外在幻觉(Extrinsic):交互时导致的问题,对prompt等敏感。

6种类型:

  1. 首字母缩写歧义(Acronym Ambiguity):

    • 描述:当语言模型错误解释或生成基于首字母缩写的文本时发生。
    • 例子:模型可能将“USA”错误地解释为“United Sports Association”而不是“United States of America”。
  2. 数字麻烦(Numeric Nuisance):

    • 描述:涉及数字的幻觉,尤其是当模型生成与事实不符的日期、时间或数量时。
    • 例子:如果模型声称“第二次世界大战在1932年开始”,这明显是错误的,因为第二次世界大战实际上是在1939年开始。
  3. 生成的魔像(Generated Golem):

    • 描述:指模型创造出完全基于虚构信息的文本输出。
    • 例子:模型可能会创造一个关于不存在的科技产品的故事,如“2025年,苹果公司发布了一款可以实现心灵感应的手机”。
  4. 虚拟声音(Virtual Voice):

    • 描述:当模型仿照某个不存在的人物或错误地归属某种声音时发生。
    • 例子:模型可能错误地引用马丁·路德·金说过的话,而这话实际上是编造的。
  5. 地理错误(Geographic Erratum):

    • 描述:模型在处理地理信息时产生的错误。
    • 例子:如果模型声称“巴黎是比利时的首都”,这显然是错误的,因为巴黎是法国的首都。
  6. 时间错位(Time Wrap):

    • 描述:模型在描述历史事件或时间线时的错误。
    • 例子:模型可能会声称“手机在20世纪初被发明”,而实际上现代手机是在20世纪末发展起来的。

使用NYTimes和Politifact作为prompt,均匀的对各种类型输出进行人工判别和手动(改词、重写)以避免幻象。但是没有万能的解决方法。

When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization

EACL 2023 | 作者包含Dan Jurafsky, Kathleen McKeown, 和Tatsunori Hashimoto | Columbia U, Stanford U

模型在pre-training中学到的bias会造成下游任务的hallucination。 adaptation和新数据fine tuning能够缓解,但并不能改变bias的倾向种类。 越是面向抽象任务的模型(相比抽出模型而言),越容易传播其bias。

intrinsic evaluations (任务无关、语言内在如PPL) vs. extrinsic evaluations (任务相关,如MT的BLEU)

Adapter-fine-tuning that fine-tune a smaller number of parameters generate fewer hallucinations than fine-tuning the entire model

截屏2024-05-13 10 10 01

这里的hallucination是在summarization时,输入条件和参数之间的选择矛盾。

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

ICLR 2024 | 北大 这篇找到hallucination是LLM一个本质现象,能够从OoD/非人类常用语言空间中,诱发出来幻觉。 (但另一方面,我觉得训练时的遗忘和归一化造成的,要是能把一般能力和外部知识分开就好了,但一方面界限可能并不那么容易划,另一方面,即使能划也可能影响能力?LLM的前期过程是完全被动的,没有自己的归纳整理)

方法(通过梯度逆向工程来寻找神奇prompt、不同程度的adversarial examples):

观察

【需要进一步看懂梯度的方法】

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

zchen0420 commented 4 months ago

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

We’re Afraid Language Models Aren’t Modeling Ambiguity

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

Unveiling the Implicit Toxicity in Large Language Models

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

FACTSCORE: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Reformulating Domain Adaptation of Large Language Model as Adapt-Retrieve-Revise

2024 | ACL-Finding | Alibaba DAMO & Kyoto U (Wan)

各种系统杂糅,大力出奇迹,GPT-4来修改。除此以外,更多是领域以内的知识介绍。

zchen0420 commented 4 months ago

Leveraging Code to Improve In-context Learning for Semantic Parsing

zchen0420 commented 4 months ago

Understanding the Effect of Model Compression on Social Bias in Large Language Models

EMNLP 2023 | CMU, NOVA LINCS, Allen Institute for AI | Gustavo Gonçalves and Emma Strubell

Social biases in LLMs are an ongoing problem that is propagated from pretraining to finetuning. Existing work has shown that pruning disproportionately impacts classification accuracy on low-frequency categories in computer vision models (Hooker et al., 2021), but that pruning transformer models can have a beneficial effect with respect to bias when modeling multilingual text (Hooker et al., 2020; Ogueji et al., 2022). Further, Xu and Hu (2022) have shown that compressing pretrained models improves model fairness by working as a regularizer against toxicity.

Testbed: Bias Bench (Meade et al., 2022), a compilation of three social bias datasets.

MLM: RoBERTa pretraining was done over 161 GB of text, which contained the 16GB used to train BERT, approximately a ten-fold increase. RoBERTa also trained for longer, with larger batch sizes which have shown to decrease the perplexity of the LLM (Liu et al., 2019). Autoregressive: The set of checkpoints released for the Pythia model family allows us to assess an even wider variety of model sizes, so that we can observe how bias varies throughout pretraining.

蒸馏和量化(regularizer effect)都有一定的正面效果于social bias,同时LM能力也降低。 模型越大、训练时间越长,越难通过此方法扭转模型的social bias;

感想

bias本身就是内藏在训练数据中的,这些间接的方法也理所当然会把两者同时拉高和降低。 除了distillation和quantization,还有pruning、mixture of experts和adaptive computation。

zchen0420 commented 4 months ago

Primacy Effect of ChatGPT

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

zchen0420 commented 4 months ago

Syntax

Learning Syntax without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

zchen0420 commented 4 months ago

Self-Rewarding Language Models

阻碍RLHF的瓶颈是那个小型的离线reward model,当LLM有指令跟随能力之后,应该让他们自己卷自己。 因为LLM已经能输出人类等级的高质量数据,也能鉴别数据中不好的输出。所以应该让他们自我放飞。

过程:

  1. $M_0$ + IFT + EFT $\rightarrow$ $M_1$
  2. $M_1$ + AIFT($M_1$) + DPO $\rightarrow$ $M_2$
  3. $M_2$ + AIFT($M_2$) + DPO $\rightarrow$ $M_3$

(再看一遍实验中的任务)

KTO: Model Alignment as Prospect Theoretic Optimization

Kahneman & Tversky’s prospect theory: humans are famously loss-averse. Human-aware loss functions (HALOs),如DPO,依赖preference,基于对比的对子

回顾RLHF,其步骤包含:

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

DPO回到最大似然,绕过的问题:显式reward估计;RL中的各种交互和角色。NIPS最佳论文,Ng首肯。 从reward function (loss) 转换到 optimal policies (loss),同样是代表human preference。同样有近端P的考虑。

截屏2024-05-15 18 33 13

(DPO原文里会提到r是为了和PPO做对比?)

截屏2024-05-15 17 09 41

在SFT、preference sampling之后,RLHF/PPO、DPO和KTO之间的关系

KTO

后面这些peer也是在以人类提供的某个目标做优化(HALO),他们有自己的目标函数。

online vs. offline:是否实时更新

ORPO: Monolithic Preference Optimization without Reference Model

KAIST Monolithic是“一整块”的意思,意味着统一了SFT和后面的alignment步骤。 Odds Ratio Preference Optimization (ORPO)的OR是在训练阶段加速alignment收敛的方法。 大多PO过程都有两条分支: $(y{win}|x)$ 和 $(y{lose}|x)$ ,目的在于调整y(整个序列生成)的出现概率。 同时不能让模型失去语言和指令跟随能力,因为softmax的影响范围是挺大的(但是SFT对减少不喜欢的句子用处不大,所以preference要包含负例)。

感想。。数据真神奇。用公式去捏模型。让输出达到自己期望的分布。(只能改变局域分布,无法获知整体:同等长度的句子集合都有自己的分布空间。) 【疑问:reward在没有RL的语境下是什么呢?为啥这些PO都反而比探索的RL强?还是因为小PLM的局限吗?】

SimPO: Simple Preference Optimization with a Reference-Free Reward

截屏2024-05-29 21 06 42
zchen0420 commented 3 months ago

田中介绍的文

INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection

2024 ICLR I Alibaba Cloud, ZJU The motivation is intuitive: LLMs preserve the highly-concentrated semantic information of the entire sentence within their internal states (Azaria & Mitchell, 2023). 1) EigenScores of states for detection; 2) rectify abnormal activations of the internal states. Clip extreme features, which tends to prevent overconfident generations. 文章提到temperature, top-k/p对于重新回顾技术有帮助,Length Normalized Entropy(基于perplexity)用于sampling来估计多个句子的不确定性。(知之为知之,不知为不知,是知也)

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?

Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities

边桑介绍的文

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

2024 ICLR | MIT, Allen I4AI, UofW, UofSC 对于现象的文字归纳和迭代收敛。模型趋向于增加例外来完善自己的理论(最初的观察)。 模型对于自己的理论感到“满意”吗?它会major revision吗?

In-Context Symbolic Regression: Leveraging Language Models for Function Discovery

VLM参与SR:初始化建议、迭代优化(基于MSE)。语言模型优于Genetic Programming (GP)。其中提到的OPRO来自于Large Language Models as Optimizers(可用于线性回归和Traveling Salesman Problem) 这篇文章写得不太有意思。