zchen0420 / nn_papers

To record my paper reading in my native language, mimicking ooooohira-san.
0 stars 0 forks source link

A Few Neurons: High-level Concentration #11

Open zchen0420 opened 5 months ago

zchen0420 commented 5 months ago

启发OpenAI去做GPT的单个sentiment neuron,和后面Issue一样,不仅有具体的location还能editing并且操纵模型。用GPT-4和数据寻找&解释GPT-2的neuron

zchen0420 commented 5 months ago

Locating and Editing Factual Associations in GPT

NIPS 2022 | Kevin Meng et al. | MIT CSAIL NEU (5.13日边桑介绍)

研究GPT(单向=decoder)怎么存储事实的 a small bunch of neurons | casual intervention 例子: (A) The Space Needle is in the downtown of (Seattle). (B) is in the downtown of ( ). 通过把GPT的单个state注入到看( )中的概率变化,发现:

Where do knowledge live? Grandma neuron? 分布但聚合;同质框架中的异质;知识和事实在哪儿?(激发) LLM是人类精华,相信一定有精彩的结构。

zchen0420 commented 5 months ago

Mass-Editing Memory in a Transformer

ICLR 2022 | Kevin Meng et al. |

先前研究:

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

The Internal State of an LLM Knows When It’s Lying

Large Language Models as Analogical Reasoners

Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test

2024 EACL | 边桑介绍 中、北印、俄、西班牙、Swahili的道德:Hindi和Swahili很不好。其他的不明显。Defining Issues Test (DIT) (Rest, 1986) on Cognitive Moral Development (CMD) (Kohlberg, 1958)。 GPT-4像研究生、其他大模型和一般成人差不多。说不同语言时,对moral dilemmas的反应也不同,utilitarian choices。

zchen0420 commented 5 months ago

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

(金于5.15介绍)

使用统计方法寻找语言相关neuron(LSN)

zchen0420 commented 5 months ago

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2024 ICLR | MIT Microsoft | Yung-Sung Chuang et al.

我感觉和ROME的观点很像,facts存在于不同的层,并不总会流到顶层输出。 通过动态寻找最终层softmax和JSD最大的premature层,来代替原有输出(不经过推理演变出来的信息是死的幻觉)。理由和根据:

zchen0420 commented 4 months ago

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency

单独基于某个语言砍效果不大;简单的方法效果好;快≠小; Dynamic Sparsification:能看到指定大小。