24/02/18 - Githubissues

junhwi commented 9 months ago

Direct Language Model Alignment from Online AI Feedback https://arxiv.org/abs/2402.04792

https://www.youtube.com/watch?v=AayZuuDDKP0

https://www.ycombinator.com/rfs

https://openai.com/sora

https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

https://hamel.dev/blog/posts/prompt/

UFO: A UI-Focused Agent for Windows OS Interaction

https://arxiv.org/abs/2402.07939

DoRA: Weight-Decomposed Low-Rank Adaptation

https://arxiv.org/abs/2402.09353

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

https://arxiv.org/abs/2402.07033

seyong92 commented 8 months ago

Stable Audio

https://arxiv.org/abs/2402.04825

https://stability-ai.github.io/stable-audio-demo/

shylee2021 commented 8 months ago

Large Language Models: A Survey https://arxiv.org/pdf/2402.06196.pdf

minbpe (minimal byte pair encoder) https://github.com/karpathy/minbpe

mistral-next https://twitter.com/aidan_mclau/status/1758336996576031214

hippothewild commented 8 months ago

Gemini 1.5

https://twitter.com/JeffDean/status/1758150158813213176

Gemini 1.5 Pro is a sparse mixture-of-expert (MoE) Transformer-based model that builds on Gemini 1.0’s (Gemini-Team et al., 2023) research advances and multimodal capabilities. Gemini 1.5 Pro also builds on a much longer history of MoE research at Google (Clark et al., 2022; Du et al., 2022; Fedus et al., 2021; Lepikhin et al., 2020; Riquelme et al., 2021; Shazeer et al., 2017; Zoph et al., 2022) and language model research in the broader literature (Anil et al., 2023; Anthropic, 2023; Brown et al., 2020; Chowdhery et al., 2023; Hoffmann et al., 2022; Jiang et al., 2024; Kim et al., 2021; OpenAI, 2023; Rae et al., 2021; Raffel et al., 2020; Roller et al., 2021; Thoppilan et al., 2022; Touvron et al., 2023a,b; Vaswani et al., 2017). MoE models use a learned routing function to direct inputs to a subset of the model’s parameters for processing. This form of conditional computation (Bengio et al., 2013; Davis and Arel, 2014; Jacobs et al., 1991) allows models to grow their total parameter count while keeping the number of parameters that are activated for any given input constant.
https://stability.ai/news/introducing-stable-cascade
- Used Würstchen architecture; Würstchen An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models, ICLR 2024 Oral https://openreview.net/forum?id=gU58d5QeGv
https://cohere.com/research/aya
Open Source AI Software Maker LangChain Launches First Paid Product — With A Massive Waitlist
- https://www.langchain.com/langsmith
https://arxiv.org/pdf/2402.09668.pdf (low confidence)
- Using proxy LLMs for training data curation

junhwi / next-gen-ai

24/02/18 #12