pentium3 sys_reading issues

pentium3 / sys_reading

system paper reading notes

235 stars 12 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Accelerating Retrieval-Augmented Language Model Serving with Speculation

#373 pentium3 opened 3 months ago
0
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design

#372 pentium3 opened 5 months ago
0
Data-Juicer: A One-Stop Data Processing System for Large Language Models

#371 pentium3 opened 5 months ago
1
LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

#370 pentium3 opened 7 months ago
0
UnFaaSener: Latency and Cost Aware Offloading of Functions from Serverless Platforms

#369 pentium3 opened 8 months ago
0
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

#368 pentium3 closed 1 month ago
1
Nightcore: Efficient and Scalable Serverless Computing for Latency-Sensitive, Interactive Microservices

#367 pentium3 opened 8 months ago
1
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

#366 pentium3 opened 8 months ago
0
Cocktail: A Multidimensional Optimization for Model Serving in Cloud

#365 pentium3 opened 8 months ago
0
Model Selection for Latency-Critical Inference Serving

#364 pentium3 opened 8 months ago
0
Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts

#363 pentium3 opened 8 months ago
0
Erlang: Application-Level Autoscaling for Cloud Microservices

#362 pentium3 opened 8 months ago
0
GMorph: Accelerating Multi-DNN Inference via Model Fusion

#361 pentium3 opened 8 months ago
0
SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

#360 pentium3 opened 8 months ago
0
Punica: Multi-Tenant LoRA Serving

#359 pentium3 opened 8 months ago
0
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices

#358 pentium3 opened 8 months ago
0
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines

#357 pentium3 opened 8 months ago
3
Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication

#356 pentium3 opened 8 months ago
0
Subgraph stationary hardware-software inference co-design

#355 pentium3 opened 8 months ago
0
Tutel: Adaptive Mixture-of-Experts at Scale

#354 pentium3 opened 8 months ago
0
Pathways: Asynchronous Distributed Dataflow for ML

#353 pentium3 opened 8 months ago
0
SpotServe: Serving Generative Large Language Models on Preemptible Instances

#352 pentium3 closed 8 months ago
3
Fast Distributed Inference Serving for Large Language Models

#351 pentium3 opened 8 months ago
1
Stateful Large Language Model Serving with Pensieve

#350 pentium3 opened 8 months ago
0
DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

#349 pentium3 opened 8 months ago
0
Efficiently Scaling Transformer Inference

#348 pentium3 opened 8 months ago
2
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

#347 pentium3 opened 8 months ago
0
FFCV: Accelerating Training by Removing Data Bottlenecks

#346 pentium3 opened 8 months ago
0
INFaaS: Automated Model-less Inference Serving

#345 pentium3 opened 8 months ago
0
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

#344 pentium3 opened 8 months ago
1
template

#343 pentium3 closed 8 months ago
0
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

#342 pentium3 opened 8 months ago
0
Splitwise: Efficient Generative LLM Inference Using Phase Splitting

#341 pentium3 opened 8 months ago
0
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

#340 pentium3 opened 8 months ago
0
tf.data service: A Case for Disaggregating ML Input Data Processing

#339 pentium3 opened 9 months ago
0
Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

#338 pentium3 opened 9 months ago
0
The Gap Between Serverless Research and Real-world Systems

#337 pentium3 opened 9 months ago
0
Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

#336 pentium3 opened 9 months ago
0
Orca: A Distributed Serving System for Transformer-Based Generative Models

#335 pentium3 opened 9 months ago
1
Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

#334 pentium3 opened 9 months ago
0
Tabi: An Efficient Multi-Level Inference System for Large Language Models

#333 pentium3 opened 9 months ago
0
QuickUpdate: a Real-Time Personalization System for Large-Scale Recommendation Models

#332 pentium3 opened 9 months ago
0
Characterization of Large Language Model Development in the Datacenter

#331 pentium3 opened 9 months ago
0
Scaling Large Language Model Training to More Than 10,000 GPUs

#330 pentium3 opened 9 months ago
0
DISTMM: Accelerating Distributed Multi-modal Model Training

#329 pentium3 opened 9 months ago
0
Approximate Caching for Efficiently Serving Diffusion Models

#328 pentium3 opened 9 months ago
0
Lifting the veil on Meta’s microservice architecture: Analyses of topology and request workflows

#326 pentium3 opened 9 months ago
0
Accelerating Distributed MoE Training and Inference with Lina

#325 pentium3 opened 9 months ago
0
Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters

#324 pentium3 opened 9 months ago
0
Optimizing Dynamic Neural Networks with Brainstorm

#323 pentium3 opened 9 months ago
0