zchen0420 commented 2 months ago

What We Know About The Voynich Manuscript

zchen0420 commented 2 months ago

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

zchen0420 commented 2 months ago

Case-Based Reasoning for Natural Language Queries over Knowledge Bases

EMNLP 2021

借助已有的知识去解决复杂的问题：case-based reasoning (CBR) CBR-KBQA：a nonparametric memory存储cases (问题和logical forms) + a parametric model

zchen0420 commented 2 months ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

数学拟合/状态机

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

U-Mamba: Enhancing Long-range Dependency For Biomedical Image Segmentation

像U-Net

KAN: Kolmogorov–Arnold Networks

2024.4 | MIT, CIT, NEU, and NSF Institute | 视频结偶复杂函数（多元近似单元合成：精确解、解析解；小波变换，样条插值）

递归NLP→迭代→数学方法
低阶（欠拟合）→平衡点→高阶（过拟合）
解PDE、避免灾难性遗忘（但我感觉BP都会导致遗忘）

对比feature:

MLP: 静态activation+动态weights
KAN: 动态activation（数学拟合，但对于神经网络的拟合就可能差不多）
- 在MNIST上效果一般（没有解析解，变量之间关系不稀疏；反复欠拟合 $\Leftrightarrow$ 过拟合）
- MIDI文件和MP3文件，人更喜欢什么？

zchen0420 commented 2 months ago

Distilling the Knowledge in a Neural Network

2015 | Geoffrey Hinton, Oriol Vinyals, and Jeff Dean | Google

Distillation: 🐛→🪲(原因：幼虫为成虫提供了很强适应性)🧑‍🏫→🧑‍🎓（老师不仅教课本的知识，还有自己的内化和感受。学生不是直接接触第一手的01世界，而是老师处理过的、包含更多信息的soft target。）

使用temperature简单影响softmax的过程很有意思，非常简单、非常易懂；
即使学生模型比较小，但知识也比混合磨的模型更丰富、有层次。
对于transfer learning也是有效的。
这里也提到到了many specialists和mixture of experts的区别：MoE是很多专家一同工作，generalist只启动many specialist里面相关的那部分来运算。

18年当时我老想着改softmax，但是Hinton却利用现有softmax形成两层的知识结构。巧妙。

【再度一遍看看公式和实验细节】

zchen0420 commented 1 month ago

Digital Humanities

2023

The Stylometry of Maoism: Quantifying the Language of Mao Zedong 生词：stylometry, prose, overarching, macrocosm, parlance；相似The Trumpiest Trump? Identifying a Subject’s Most Characteristic Tweets（比较早的文本分类，尝试了许多古典手法，BERT最强）

[2022]()

Sentiment is all you need to win US Presidential elections 生词：lofty promise, mud-slinging, mandate, heuristic-driven→data-driven, populism, double cross, aspiration, 方法：GoEmotions, 训练+微调BERT，问卷调查。简化（黑白+情绪+党派=>成败）

zchen0420 commented 1 month ago

semantic change / embedding space

Enriching Word Usage Graphs with Cluster Definitions

选取200个Word Usage（句子），标注他们之间的远近程度并训练模型泛化这个距离。

Norm of Word Embedding Encodes Information Gain

概率和条件概率之间的差： $KL(w) := KL(p(\cdot|w)||p(\cdot))$ LIME for BERT

A Semantic Distance Metric Learning approach for Lexical Semantic Change Detection

zchen0420 commented 1 month ago

Zhuoyuan Mao

[LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation] 句子对齐：句子之间的相似度；把LaBSE蒸馏进更小的LEALLA中。
When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation? 在MT的loss中加入（对比式）word alignment作为joint task
Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation Pre和post LN的影响，pre泛化能力不足，所以0-shot不好。

zchen0420 / nn_papers

miscellaneous #7

What We Know About The Voynich Manuscript

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

Case-Based Reasoning for Natural Language Queries over Knowledge Bases

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

U-Mamba: Enhancing Long-range Dependency For Biomedical Image Segmentation

KAN: Kolmogorov–Arnold Networks

Distilling the Knowledge in a Neural Network

Digital Humanities

2023

[2022]()

semantic change / embedding space

Enriching Word Usage Graphs with Cluster Definitions

Norm of Word Embedding Encodes Information Gain

A Semantic Distance Metric Learning approach for Lexical Semantic Change Detection