tpoisonooo / llama.onnx

LLaMa/RWKV onnx models, quantization and testcase
GNU General Public License v3.0
345 stars 31 forks source link

GPU Inference #25

Open tpoisonooo opened 1 year ago

tpoisonooo commented 1 year ago

llama.onnx is primarily used for understanding LLM and converting it to NPU.

If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy.

It supports:

tpoisonooo commented 1 year ago

19 #16 #15

tpoisonooo commented 1 year ago

22 #15

yiliu30 commented 1 year ago
  • Tensor parallelism

Nice work! Can tensor parallelism be implemented using both Torch and ONNX models?