Open junhwi opened 9 months ago
Large Language Models: A Survey https://arxiv.org/pdf/2402.06196.pdf
minbpe (minimal byte pair encoder) https://github.com/karpathy/minbpe
mistral-next https://twitter.com/aidan_mclau/status/1758336996576031214
Gemini 1.5
https://twitter.com/JeffDean/status/1758150158813213176
Gemini 1.5 Pro is a sparse mixture-of-expert (MoE) Transformer-based model that builds on Gemini 1.0’s (Gemini-Team et al., 2023) research advances and multimodal capabilities. Gemini 1.5 Pro also builds on a much longer history of MoE research at Google (Clark et al., 2022; Du et al., 2022; Fedus et al., 2021; Lepikhin et al., 2020; Riquelme et al., 2021; Shazeer et al., 2017; Zoph et al., 2022) and language model research in the broader literature (Anil et al., 2023; Anthropic, 2023; Brown et al., 2020; Chowdhery et al., 2023; Hoffmann et al., 2022; Jiang et al., 2024; Kim et al., 2021; OpenAI, 2023; Rae et al., 2021; Raffel et al., 2020; Roller et al., 2021; Thoppilan et al., 2022; Touvron et al., 2023a,b; Vaswani et al., 2017). MoE models use a learned routing function to direct inputs to a subset of the model’s parameters for processing. This form of conditional computation (Bengio et al., 2013; Davis and Arel, 2014; Jacobs et al., 1991) allows models to grow their total parameter count while keeping the number of parameters that are activated for any given input constant.
https://stability.ai/news/introducing-stable-cascade
Würstchen An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
, ICLR 2024 Oral https://openreview.net/forum?id=gU58d5QeGv Open Source AI Software Maker LangChain Launches First Paid Product — With A Massive Waitlist
https://arxiv.org/pdf/2402.09668.pdf (low confidence)
Direct Language Model Alignment from Online AI Feedback https://arxiv.org/abs/2402.04792
https://www.youtube.com/watch?v=AayZuuDDKP0
https://www.ycombinator.com/rfs
https://openai.com/sora
https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/
https://hamel.dev/blog/posts/prompt/
UFO: A UI-Focused Agent for Windows OS Interaction
https://arxiv.org/abs/2402.07939
DoRA: Weight-Decomposed Low-Rank Adaptation
https://arxiv.org/abs/2402.09353
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
https://arxiv.org/abs/2402.07033