Open junhwi opened 7 months ago
BlackMamba: Mixture of Experts for State-Space Models https://arxiv.org/pdf/2402.01771.pdf
OLMo: Accelerating the Science of Language Models https://arxiv.org/abs/2402.00838 Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research https://arxiv.org/abs/2402.00159
Can Large Language Models Understand Context? https://arxiv.org/abs/2402.00858
저희 서비스 업데이트가 막바지라 참여가 가능할지 모르겠는데... 일단 참여하더라도 이번 주는 듣는 걸로 가겠습니다..
DeepSeek
LLaVA-1.6
https://llava-vl.github.io/blog/2024-01-30-llava-1-6/
Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality
https://gitclear-public.s3.us-west-2.amazonaws.com/Coding-on-Copilot-2024-Developer-Research.pdf
SSM
Repeat After Me: Transformers are Better than State Space Models at Copying
https://arxiv.org/abs/2402.01032 https://huggingface.co/papers/2402.01032#65c12b0b5bf72d1811466dc0
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
https://arxiv.org/abs/2402.04248
Agent
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
https://github.com/git-disl/PokeLLMon https://arxiv.org/pdf/2402.01118.pdf