junhwi commented 7 months ago

DeepSeek

Jan 5, 24 DeepSeek LLM https://github.com/deepseek-ai/DeepSeek-LLM
Jan 11, 24 DeepSeekMoE https://github.com/deepseek-ai/DeepSeek-MoE
Jan 26, 24 DeepSeek-Coder https://github.com/deepseek-ai/DeepSeek-Coder
Feb 6, 24 DeepSeekMath7B https://github.com/deepseek-ai/DeepSeek-Math

LLaVA-1.6

https://llava-vl.github.io/blog/2024-01-30-llava-1-6/

Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality

Code churn -- the percentage of lines that are reverted or updated less than two weeks after being authored -- is projected to double in 2024 compared to its 2021, pre-AI baseline.

https://gitclear-public.s3.us-west-2.amazonaws.com/Coding-on-Copilot-2024-Developer-Research.pdf

SSM

Repeat After Me: Transformers are Better than State Space Models at Copying

https://arxiv.org/abs/2402.01032 https://huggingface.co/papers/2402.01032#65c12b0b5bf72d1811466dc0

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

https://arxiv.org/abs/2402.04248

Agent

PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models

https://github.com/git-disl/PokeLLMon https://arxiv.org/pdf/2402.01118.pdf

hippothewild commented 7 months ago

https://twitter.com/AiBreakfast/status/1754008072828158416
BlackMamba: Mixture of Experts for State-Space Models https://arxiv.org/pdf/2402.01771.pdf
- They actually trained and opened the model, but not very impressive though https://huggingface.co/Zyphra/BlackMamba-2.8B

shylee2021 commented 7 months ago

OLMo: Accelerating the Science of Language Models https://arxiv.org/abs/2402.00838 Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research https://arxiv.org/abs/2402.00159

Can Large Language Models Understand Context? https://arxiv.org/abs/2402.00858

seyong92 commented 7 months ago

저희 서비스 업데이트가 막바지라 참여가 가능할지 모르겠는데... 일단 참여하더라도 이번 주는 듣는 걸로 가겠습니다..

junhwi / next-gen-ai

24/02/07 #11