Open qiankunli opened 1 week ago
Hey there, thanks for submitting an issue!
Currently KubeAI supports loading models from non volume locations (i.e. huggingface hub). We do support caching models to volumes (See caching docs). With the current caching implementation, KubeAI manages the PVC itself and always downloads the model from elsewhere (i.e. huggingface hub).
It sounds like you want to define a PVC / PV as the source of the model (not just a caching location)? If so, can you elaborate a little more on your system design?
The following diagram is my understanding of KubeAI (there may be errors).
China cannot directly access Hugging Face, so we usually download the model file with proxy and then upload it to a bucket on OSS. There is a corresponding StorageClass for OSS to provide PV and PVC for the bucket, allowing the pod to access the model files through the PVC, as if the model files were in a directory within the pod.
Great diagrams! By "bucket on OSS" do you mean Alibaba Object Storage Service? If so, I am guessing you are accessing via FUSE? If this is the case, I see two paths might might unblock you:
OR
Great diagrams! By "bucket on OSS" do you mean Alibaba Object Storage Service? If so, I am guessing you are accessing via FUSE? If this is the case, I see two paths might might unblock you:
- We add support for pulling models via the S3 API (Looks like Alibaba OSS is S3-compatible) - We are about to add this feature (hopefully this week). You could use this in conjunction with the KubeAI caching functionality so that the bucket-to-KubeAI pull would only need to happen once.
OR
- We add support for pulling models from a in cluster PV - Not sure we have a solid reason to support this yet.
method1 is ok, thank you! @nstogner
Assuming all model files are stored on OSS/bucket, PVs are provided through StorageClass(related with oss), and each model pod is associated with a PVC. Within the pod, VLLM can access model files by accessing a mounted directory( like
/data/models
).How should the above mode be configured?