Open junhwi opened 1 month ago
Cohere Aya 23 https://cohere.com/blog/aya23
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention https://arxiv.org/abs/2405.12981
Scarlett Johansson told OpenAI not to use her voice — and it did anyway https://x.com/verge/status/1792689427585646859
NumPy 2.0 https://numpy.org/devdocs/release/2.0.0-notes.html
Images that Sound: Composing Images and Sounds on a Single Canvas https://arxiv.org/abs/2405.12221
Whisper large model with songs
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
https://arxiv.org/abs/2405.10637
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
https://arxiv.org/abs/2405.12130
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
https://arxiv.org/abs/2405.14333