Accelerating ML Inference at Scale with ONNX, Triton and Seldon

01:18 Kickoff Introduction 02:25 Agenda for the day 03:58 What is GPT 2 ? 05:46 How to do this ? 06:40 Fetching the model form Hugging Face 11:10 How to scale it ? 13:00 How to go from model artifact to deployed model 13:45 Optimise the model using Onnx format 15:05 Productioning using tempo 18:35 Defining the wrapper 19:40 Run with triton in Docker 21:00 Custom Transformer logic 23:40 Run full pipeline in Docker 24:30 Run in K8s 26:00 Conclusion and QA

numfocus / YouTubeVideoTimestamps

Accelerating ML Inference at Scale with ONNX, Triton and Seldon | PyData Global 2021 #137