lix19937 / tensorrt-insight

Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda
12 stars 0 forks source link
asp nvidia ptq qat tensorrt

TensorRT 是Nvidia 推出的跨 nv-gpu架构的半开源高性能AI 推理引擎框架/库,提供了cpp/python接口,以及用户自定义插件方法,涵盖了AI 推理引擎技术的主要方面。

TensorRT is a semi-open source high-performance AI inference engine framework/library developed by Nvidia, which spans across nv-gpu architectures.
Provides cpp/python interfaces and user-defined plugin methods, covering the main aspects of AI inference engine technology.

topic 主题 notes
overview 概述
layout 内存布局
compute_graph_optimize 计算图优化
dynamic_shape 动态shape
plugin 插件
calibration 标定
asp 稀疏
qat 量化感知训练
trtexec OSS辅助工具
tool 辅助脚本
runtime 运行时
inferflow 模型调度
mps MPS
deploy 基于onnx部署流程, trt 工具使用
py-tensorrt python tensorrt封装 解析 tensorrt __init__
cookbook 食谱
incubator 孵化器
developer_guide 开发者指导
triton-inference-server triton
cuda cuda编程
onnxruntime op onnxrt 自定义op 辅助图优化,layer输出对齐

Reference

https://docs.nvidia.com/deeplearning/tensorrt/archives/
https://developer.nvidia.com/search?page=1&sort=relevance&term=
https://github.com/HeKun-NVIDIA/TensorRT-Developer_Guide_in_Chinese/tree/main
https://docs.nvidia.com/deeplearning/tensorrt/migration-guide/index.html
https://developer.nvidia.com/zh-cn/blog/nvidia-gpu-fp8-training-inference/