issues
search
pentium3
/
sys_reading
system paper reading notes
235
stars
12
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Accelerating Retrieval-Augmented Language Model Serving with Speculation
#373
pentium3
opened
3 months ago
0
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
#372
pentium3
opened
5 months ago
0
Data-Juicer: A One-Stop Data Processing System for Large Language Models
#371
pentium3
opened
5 months ago
1
LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices
#370
pentium3
opened
7 months ago
0
UnFaaSener: Latency and Cost Aware Offloading of Functions from Serverless Platforms
#369
pentium3
opened
8 months ago
0
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
#368
pentium3
closed
1 month ago
1
Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices
#367
pentium3
opened
8 months ago
1
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
#366
pentium3
opened
8 months ago
0
Cocktail: A Multidimensional Optimization for Model Serving in Cloud
#365
pentium3
opened
8 months ago
0
Model Selection for Latency-Critical Inference Serving
#364
pentium3
opened
8 months ago
0
Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts
#363
pentium3
opened
8 months ago
0
Erlang: Application-Level Autoscaling for Cloud Microservices
#362
pentium3
opened
8 months ago
0
GMorph: Accelerating Multi-DNN Inference via Model Fusion
#361
pentium3
opened
8 months ago
0
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
#360
pentium3
opened
8 months ago
0
Punica: Multi-Tenant LoRA Serving
#359
pentium3
opened
8 months ago
0
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
#358
pentium3
opened
8 months ago
0
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
#357
pentium3
opened
8 months ago
3
Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication
#356
pentium3
opened
8 months ago
0
Subgraph stationary hardware-software inference co-design
#355
pentium3
opened
8 months ago
0
Tutel: Adaptive Mixture-of-Experts at Scale
#354
pentium3
opened
8 months ago
0
Pathways: Asynchronous Distributed Dataflow for ML
#353
pentium3
opened
8 months ago
0
SpotServe: Serving Generative Large Language Models on Preemptible Instances
#352
pentium3
closed
8 months ago
3
Fast Distributed Inference Serving for Large Language Models
#351
pentium3
opened
8 months ago
1
Stateful Large Language Model Serving with Pensieve
#350
pentium3
opened
8 months ago
0
DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
#349
pentium3
opened
8 months ago
0
Efficiently Scaling Transformer Inference
#348
pentium3
opened
8 months ago
2
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
#347
pentium3
opened
8 months ago
0
FFCV: Accelerating Training by Removing Data Bottlenecks
#346
pentium3
opened
8 months ago
0
INFaaS: Automated Model-less Inference Serving
#345
pentium3
opened
8 months ago
0
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
#344
pentium3
opened
8 months ago
1
template
#343
pentium3
closed
8 months ago
0
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
#342
pentium3
opened
8 months ago
0
Splitwise: Efficient Generative LLM Inference Using Phase Splitting
#341
pentium3
opened
8 months ago
0
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
#340
pentium3
opened
8 months ago
0
tf.data service: A Case for Disaggregating ML Input Data Processing
#339
pentium3
opened
9 months ago
0
Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance
#338
pentium3
opened
9 months ago
0
The Gap Between Serverless Research and Real-world Systems
#337
pentium3
opened
9 months ago
0
Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update
#336
pentium3
opened
9 months ago
0
Orca: A Distributed Serving System for Transformer-Based Generative Models
#335
pentium3
opened
9 months ago
1
Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning
#334
pentium3
opened
9 months ago
0
Tabi: An Efficient Multi-Level Inference System for Large Language Models
#333
pentium3
opened
9 months ago
0
QuickUpdate: a Real-Time Personalization System for Large-Scale Recommendation Models
#332
pentium3
opened
9 months ago
0
Characterization of Large Language Model Development in the Datacenter
#331
pentium3
opened
9 months ago
0
Scaling Large Language Model Training to More Than 10,000 GPUs
#330
pentium3
opened
9 months ago
0
DISTMM: Accelerating Distributed Multi-modal Model Training
#329
pentium3
opened
9 months ago
0
Approximate Caching for Efficiently Serving Diffusion Models
#328
pentium3
opened
9 months ago
0
Lifting the veil on Meta’s microservice architecture: Analyses of topology and request workflows
#326
pentium3
opened
9 months ago
0
Accelerating Distributed MoE Training and Inference with Lina
#325
pentium3
opened
9 months ago
0
Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters
#324
pentium3
opened
9 months ago
0
Optimizing Dynamic Neural Networks with Brainstorm
#323
pentium3
opened
9 months ago
0
Next