zchen0420 commented 1 month ago

Multilingual Amazon Reviews Corpus (MARC) {En, Jp, De, Fr, Es, Zh} [2015 2019] text classification: review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID, and the coarse-grained product category (e.g., 'books', 'appliances', etc.), balanced across the 5 possible star ratings, in each language. data split: 200,000, 5,000, and 5,000.

Building Educational Applications (BAE)

Multilingual Lexical Simplification Pipeline (MLSP) shared task Lexical Complexity Prediction (LCP): a 5-point Likert scale (1-5)

TYDI QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

GitHub

问和答都自然（提问者想知道、回答也用本地语言）的语言多样数据集。用比较自然、发散的方式，刺激人们提问，并在Wiki中锁定答案。

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

对NMT和MLLM的coreference resolution (CoR) & commonsense reasoning (CSR)

SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks

测试对任务的泛化能力/指令跟随能力：1616个tasks和专家写的instructions。 Tk-instrunct

zchen0420 commented 1 month ago

Meta AI

LLaMA: Open and Efficient Foundation Language Models

XGLM: Few-shot Learning with Multilingual Generative Language Models

zchen0420 commented 1 month ago

Language Is Not All You Need: Aligning Perception with Language Models

KOSMOS-1

Trained on web-scale multimodal corpora: arbitrarily interleaved text and images, image-caption pairs, and text data;
Evaluated tasks:
- language understanding, generation, and even OCR-free NLP (directly fed with document images)
- perception-language tasks, including multimodal dialogue, image captioning, visual question answering
- vision tasks, such as image recognition with descriptions (specifying classification via text instructions)

zchen0420 commented 4 weeks ago

数学模型

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

2023 | HKUofS&T MS

数学推理数据（MGSM8KInstruct）+ SFT => MathOctopus（few-shot上优于ChatGPT）发现：1）为多语言扩展rejection sampling strategy有一定效果；2）多语言变好了能让单语言也更好。

zchen0420 commented 3 weeks ago

zchen0420 / nn_papers

(M)LLMs and Datasets #4

TYDI QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks

Meta AI

LLaMA: Open and Efficient Foundation Language Models

XGLM: Few-shot Learning with Multilingual Generative Language Models

Language Is Not All You Need: Aligning Perception with Language Models

KOSMOS-1

数学模型

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

LSTM

xLSTM