Open junhwi opened 6 months ago
Street fighter 3 with LLM https://github.com/OpenGenerativeAI/llm-colosseum?fbclid=IwAR3rQzwKfyLjSo06N_XDCflHZ21eckx4OxrXGP1cB22NDaYXSM8L8AV1STI_aem_Ac9nBs21hzTq0RZsWZWspLoc0cPmya3hEJMCQIHORDe1IbCIv4nfFSqRClOcIHM-kQI
Supertone Shift https://product.supertone.ai/shift
NaturalSpeech3 https://arxiv.org/abs/2403.03100
Databricks releases DRBX, the new SOTA LLM (they insist) https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm https://github.com/databricks/dbrx https://huggingface.co/databricks/dbrx-base https://twitter.com/danielhanchen/status/1772981050530316467
Is Fine-Tuning Still Valuable? https://hamel.dev/blog/posts/fine_tuning_valuable.html
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning https://arxiv.org/abs/2403.17919
The Unreasonable Ineffectiveness of the Deeper Layers https://arxiv.org/abs/2403.17887
Claude 3 surpasses GPT-4 on Chatbot Arena https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/
DoRA a new, better, and faster LoRA? https://x.com/_philschmid/status/1773991507306963269?s=20 https://arxiv.org/abs/2402.09353
https://www.ai21.com/blog/announcing-jamba
http://qwenlm.github.io/blog/qwen-moe/
https://x.ai/blog/grok-1.5
https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression https://arxiv.org/abs/2403.12968
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System https://arxiv.org/abs/2402.15538