Closed hsj576 closed 3 weeks ago
If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.
Hi! To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging?
Hi! To complete this issue, does it mean that I need to have the corresponding GPU resources to run large models for project debugging?
Yes, student of this OSPP project needs to have access to at least one consumer-grade GPU (2080,3090, etc.). However, since this project mainly focuses on LLM inference, it does not require so much computing resources. For edge LLM models, if your available computing resources is limited, you can choose small-scale LLM models such as TinyLlama-1.1b, Qwen1.5-0.5B, etc. These models can be deployed even on a personal laptop for inference. For cloud LLM models, if your computing resources are not sufficient enough to support the deployment of LLM at a scale of 10 billion or 100 billion, you can use GPT-4, Claude3, Kimi, GLM4 and other commercial LLMs with open API.
What would you like to be added/modified: This issue aims to build a cloud-edge collaborative inference framework for LLM on KubeEdge-Ianvs. Namely, it aims to help all cloud-edge LLM developers improve inference accuracy with strong privacy and fast inference speed. This issue includes:
Why is this needed: At present, LLM models with the scale of 10 billion and 100 billion parameters, led by Llama2-70b and Qwen-72b, can only be deployed in the cloud with sufficient computing power to provide inference services. However, for users of edge terminals, on the one hand, cloud LLM services face the problem of slow inference speed and long response delay; on the other hand, uploading edge private data to the cloud for processing may face the risk of privacy disclosure. At the same time, the inference accuracy of LLM models that can be deployed in edge environments (such as TinyLlama-1.1b) is much lower than that of cloud LLM. Therefore, using cloud LLM or edge LLM alone cannot simultaneously take into account privacy protection, real-time inference and inference accuracy. Therefore, we need to combine the advantages of high inference accuracy of cloud LLM with strong privacy and fast inference of edge LLM through the strategy of cloud edge collaboration, so as to better meet the needs of edge users.
Recommended Skills: KubeEdge-Ianvs, Python, Pytorch, LLMs
Useful links: Introduction to Ianvs Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing