jungwoo-ha / WeeklyArxivTalk

[Zoom & Facebook Live] Weekly AI Arxiv 시즌2
973 stars 41 forks source link

[20230219] Weekly AI ArXiv 만담 시즌2 - 6회차 #72

Open scene-the-ella opened 1 year ago

scene-the-ella commented 1 year ago

Within 7 days Conferences

dhlee347 commented 1 year ago

Non-arxiv

Arxiv

veritas9872 commented 1 year ago

삶의 목적을 찾는 45가지 방법 Yes24: http://www.yes24.com/Product/Goods/117506954

ChatGPT와 파파고가 쓴 첫 책이 나왔습니다. 일러스트 또한 Shutterstock AI를 통해 생성되었습니다. image


Symbolic Discovery of Optimization Algorithms ArXiv: https://arxiv.org/abs/2302.06675 GitHub: https://github.com/google/automl/tree/master/lion

구글에서 AutoML을 통해 새로운 Optimizer 알고리즘을 제안했습니다.

image

image

image

image

image

gyunggyung commented 1 year ago

재밌는 논문 이미 보셨을 수도 있지만 공유합니다.

@channel 이번주 목요일 오늘이죠. ChatGPT와 그 이상의 소형 모델 스터디 하실분 찾습니다. 시간은 하는 분들에 따라서 협의 합니다. 오후 2시 5시 11시가 일단 후보입니다. 저는 4시와 6시반에 일정이 있습니다.

InstructGPT : Training language models to follow instructions with human feedback 논문 링크 : https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf

블로그 포스팅 : https://towardsdatascience.com/the-new-version-of-gpt-3-is-much-much-better-53ac95f21cfb

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan

Download PDF

We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work. https://arxiv.org/abs/2204.05862

Multimodal Chain-of-Thought Reasoning in Language Models

Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola

Download PDF

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. With Multimodal-CoT, our model under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16 percentage points (75.17%->91.68% accuracy) on the ScienceQA benchmark and even surpasses human performance. Code is publicly available available at this https URL. https://arxiv.org/abs/2302.00923

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.

News!

• BioGPT-Large model with 1.5B parameters is coming, currently available on PubMedQA task with SOTA performance of 81% accuracy. See Question Answering on PubMedQA for evaluation

https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly Retrofit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. Study: https://lnkd.in/g2Xn4462
Code: https://lnkd.in/gJ7BxVJJ Datasets: https://lnkd.in/gdp6NF9k