Open scene-the-ella opened 1 year ago
How should AI systems behave, and who should decide? by OpenAI (16 Feb 2023)
MarioGPT
Noam Chomsky도 ChatGPT에 대해서 언급
The Bitterest of Lessons: The Role of Data and Optimization in Emergence (by Sergey Levine)
Benchmarking Large Language Models for News Summarization (https://arxiv.org/abs/2301.13848, 31 Jan 2023, 정독X)
최근 유행하는 LLM의 요약 성능을 Human-Evaluation으로 측정해보았음. (자세히 못읽음;;)
모델 크기보다는 Instruction Tuning이 중요하다.
기존 평가들은 low-quality references로 인해 한계가 있어서 freelance 작가들에게 부탁해서 high-quality summaries와 비교.
결론 : 작가의 요약과 Instruct GPT-3 요약 성능은 비슷하다 ! (다만 본문 변형을 하지 않고 copy를 많이함.) -
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (https://arxiv.org/abs/2302.04023, 8 Feb 2023, 정독X)
ChatGPT 성능을 체계적으로 조사 (자세히 못읽음;;)
21 data sets covering 8 different common NLP application tasks
ChatGPT는 zero-shot에서는 대부분의 태스크에서 다른 LLM보다 성능이 좋고, 일부 태스크에서는 심지어 Fine-tuned model 보다 성능이 좋음.
text prompt로 multi-modal content도 잘 만듦. (code gen을 활용)
10개의 다른 Reasoning Task에서 64.33% : 잘 못함. (귀납보다 연역을 잘함. 왜?)
Hallucination 문제가 있음.
multi-turn interactive feature로 인한 성능 향상이 있음.
Code Gen이 되니까 이런 것도 되네?
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? (https://arxiv.org/abs/2302.06476, 15 Feb 2023, 정독X)
The Capacity for Moral Self-Correction in Large Language Models (https://arxiv.org/abs/2302.07459, 15 Feb 2023)
Languages are Rewards: Chain of Hindsight Finetuning using Human Feedback by Pieter Abbeel (https://arxiv.org/abs/2302.02676, 13 Feb 2023)
The Wisdom of Hindsight Makes Language Models Better Instruction Followers by Pieter Abbeel (https://arxiv.org/abs/2302.05206, 10 Feb 2023)
RLHF는 인상적이지만 RL학습은 너무 복잡하다.
instruction alignment 문제를 decision making에서의 goal-reaching 문제로 보자.
model output으로부터 instruction을 relabeling 해서 output 뒷 부분을 fine-tuning (reward-free approaches)
Two-stage Reinforcement Learning
Instruction Relabeling Strategy
생성한 답이 맞으면, "Generate a correct answer to this problem" 으로,
생성한 답이 틀리면, “Generate a wrong answer to this problem” 으로 고침.
Contrastive Instruction Following
Entropy Regularization
REPLUG: Retrieval-Augmented Black-Box Language Models (https://arxiv.org/abs/2301.12652, 1 Feb 2023)
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning (https://arxiv.org/abs/2302.02662, 6 Feb 2023, 정독X)
삶의 목적을 찾는 45가지 방법 Yes24: http://www.yes24.com/Product/Goods/117506954
ChatGPT와 파파고가 쓴 첫 책이 나왔습니다. 일러스트 또한 Shutterstock AI를 통해 생성되었습니다.
Symbolic Discovery of Optimization Algorithms ArXiv: https://arxiv.org/abs/2302.06675 GitHub: https://github.com/google/automl/tree/master/lion
구글에서 AutoML을 통해 새로운 Optimizer 알고리즘을 제안했습니다.
재밌는 논문 이미 보셨을 수도 있지만 공유합니다.
@channel 이번주 목요일 오늘이죠. ChatGPT와 그 이상의 소형 모델 스터디 하실분 찾습니다. 시간은 하는 분들에 따라서 협의 합니다. 오후 2시 5시 11시가 일단 후보입니다. 저는 4시와 6시반에 일정이 있습니다.
InstructGPT : Training language models to follow instructions with human feedback 논문 링크 : https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf
블로그 포스팅 : https://towardsdatascience.com/the-new-version-of-gpt-3-is-much-much-better-53ac95f21cfb
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan
Download PDF
We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work. https://arxiv.org/abs/2204.05862
Multimodal Chain-of-Thought Reasoning in Language Models
Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola
Download PDF
Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. With Multimodal-CoT, our model under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16 percentage points (75.17%->91.68% accuracy) on the ScienceQA benchmark and even surpasses human performance. Code is publicly available available at this https URL. https://arxiv.org/abs/2302.00923
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.
News!
• BioGPT-Large model with 1.5B parameters is coming, currently available on PubMedQA task with SOTA performance of 81% accuracy. See Question Answering on PubMedQA for evaluation
We enhance auto-regressive language models by conditioning on document chunks retrieved from a
large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our
Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1
on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to
downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert
retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on
an order of magnitude more data than what is typically consumed during training. We typically train
Retro from scratch, yet can also rapidly Retrofit pre-trained transformers with retrieval and still
achieve good performance. Our work opens up new avenues for improving language models through
explicit memory at unprecedented scale.
Study: https://lnkd.in/g2Xn4462
Code: https://lnkd.in/gJ7BxVJJ
Datasets: https://lnkd.in/gdp6NF9k
Within 7 days Conferences