substratusai / kubeai

Private Open AI on Kubernetes
https://www.kubeai.org
Apache License 2.0
512 stars 41 forks source link

how to access model files by pvc? #303

Open qiankunli opened 1 week ago

qiankunli commented 1 week ago

Assuming all model files are stored on OSS/bucket, PVs are provided through StorageClass(related with oss), and each model pod is associated with a PVC. Within the pod, VLLM can access model files by accessing a mounted directory( like/data/models).

How should the above mode be configured?

nstogner commented 1 week ago

Hey there, thanks for submitting an issue!

Currently KubeAI supports loading models from non volume locations (i.e. huggingface hub). We do support caching models to volumes (See caching docs). With the current caching implementation, KubeAI manages the PVC itself and always downloads the model from elsewhere (i.e. huggingface hub).

It sounds like you want to define a PVC / PV as the source of the model (not just a caching location)? If so, can you elaborate a little more on your system design?

qiankunli commented 1 week ago

The following diagram is my understanding of KubeAI (there may be errors).

whiteboard_exported_image (3)

China cannot directly access Hugging Face, so we usually download the model file with proxy and then upload it to a bucket on OSS. There is a corresponding StorageClass for OSS to provide PV and PVC for the bucket, allowing the pod to access the model files through the PVC, as if the model files were in a directory within the pod.

whiteboard_exported_image (4)

nstogner commented 1 week ago

Great diagrams! By "bucket on OSS" do you mean Alibaba Object Storage Service? If so, I am guessing you are accessing via FUSE? If this is the case, I see two paths might might unblock you:

  1. We add support for pulling models via the S3 API (Looks like Alibaba OSS is S3-compatible) - We are about to add this feature (hopefully this week). You could use this in conjunction with the KubeAI caching functionality so that the bucket-to-KubeAI pull would only need to happen once.

OR

  1. We add support for pulling models from a in cluster PV - Not sure we have a solid reason to support this yet.
qiankunli commented 1 week ago

Great diagrams! By "bucket on OSS" do you mean Alibaba Object Storage Service? If so, I am guessing you are accessing via FUSE? If this is the case, I see two paths might might unblock you:

  1. We add support for pulling models via the S3 API (Looks like Alibaba OSS is S3-compatible) - We are about to add this feature (hopefully this week). You could use this in conjunction with the KubeAI caching functionality so that the bucket-to-KubeAI pull would only need to happen once.

OR

  1. We add support for pulling models from a in cluster PV - Not sure we have a solid reason to support this yet.

method1 is ok, thank you! @nstogner