Open tpoisonooo opened 1 year ago
llama.onnx is primarily used for understanding LLM and converting it to NPU.
llama.onnx
If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy.
It supports:
Tensor parallelism
Nice work! Can tensor parallelism be implemented using both Torch and ONNX models?
llama.onnx
is primarily used for understanding LLM and converting it to NPU.If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy.
It supports: