zilliztech / milvus-operator

The Kubernetes Operator of Milvus.
https://milvus.io
Apache License 2.0
33 stars 20 forks source link

Milvus Operator confusion #89

Open mxchinegod opened 4 months ago

mxchinegod commented 4 months ago

Hello. Some questions.

When using the Operator and installing a Milvus cluster with kubectl apply -f like so

apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
  name: wtf
  namespace: workspace
  labels:
    app: milvus
spec:
  config: {}

the cluster creates but no service on 19530 is created. Additionally I just see minio and etc.

furthermore deleting does not do anything to the CRD object (Milvus)

Screenshot 2024-03-06 at 8 58 04 PM

What does the Operator do so I can be sure to use it correctly?

Each time I have had to delete resources manually and it seems like they will not deploy again using the same name...

ATTENTIONS: THE MAIN BRANCH MAY BE IN AN UNSTABLE OR EVEN BROKEN STATE DURING DEVELOPMENT. Which branch should we use?

mxchinegod commented 4 months ago

Merely using the operator as a sub chart:

❯ helm install milvus . 
NAME: milvus
LAST DEPLOYED: Wed Mar  6 21:08:19 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

helm-charts.git/milvus on  main [!?] is 📦 v1.0.0 via ⎈ v3.12.3 on ☁️  (us-east-1) took 2s 
❯ kubectl apply -f ./templates/*
Error from server (InternalError): error when creating "./templates/test-cluster.yaml": Internal error occurred: failed calling webhook "mmilvus.kb.io": failed to call webhook: Post "https://milvus-milvus-operator-webhook-service.default.svc:443/mutate-milvus-io-v1beta1-milvus?timeout=10s": no endpoints available for service "milvus-milvus-operator-webhook-service"
haorenfsa commented 4 months ago

Hi @mxchinegod,

  1. the formal released charts can be found in https://github.com/zilliztech/milvus-operator/releases

or through our chart repo: helm repo add milvus-operator https://zilliztech.github.io/milvus-operator/

for more info, check the docs in repo: https://github.com/zilliztech/milvus-operator/blob/main/docs/installation/installation.md

  1. kubectl apply -f ./templates/*

    Error from server (InternalError): error when creating "./templates/test-cluster.yaml": Internal error occurred: failed calling webhook "mmilvus.kb.io": failed to call webhook: Post "https://milvus-milvus-operator-webhook-service.default.svc:443/mutate-milvus-io-v1beta1-milvus?timeout=10s": no endpoints available for service "milvus-milvus-operator-webhook-service"

This is because your milvus-operator is still being installed, try use --wait in your helm install command

image
  1. About dependency auto deletion

By default all dependency and data are kept when you delete the CR, so that you can still recover your milvus & data through creating a CR of the same name.

If you want to delete one of the dependency & data as well, take pulsar as example:

spec:
  dependencies: 
    pulsar:
      inCluster: 
        deletionPolicy: Delete
        pvcDeletion: true

Same changes should be applied to all your dependencies :

spec:
  dependencies: 
    pulsar:
      inCluster: 
        deletionPolicy: Delete
        pvcDeletion: true
    etcd:
      inCluster: 
        deletionPolicy: Delete
        pvcDeletion: true
    storage:
      inCluster: 
        deletionPolicy: Delete
        pvcDeletion: true
mxchinegod commented 4 months ago

They stay when delete the release too and I'm not talking strictly PVC. What i'm getting at is the operator doesn't seem to actually be managing milvus through the k8s layer, just templating it?

Screenshot 2024-03-08 at 8 38 57 AM

PVCs do not get deleted by association with a deleted pod, etc. (usually defined in the CRD like you said) so there is no reason to put each component in a separate release (-etc, ...-minio, etc) and then also parameterize the deletion policy as the declarative strategy? why not just one release for the operator and make the CRD catalog the dependencies of each cluster?

debatably deleting a release should delete the data anyways! We have longhorn and ebs and proper tools for managing PVCs and what they represent. What is a release if not the entire declaration of your deployment?

these releases are not interchangable in an operating context and we have permissions issues now where people can see resources we don't want them to be able to because every aspect of a Milvus is also its own release.

Here is an example of what I mean and they have a similar paradigm as you guys.

https://github.com/pegasystems/pega-helm-charts

They too have multiple, potentially-embedded dependencies like Cassandra, node roles, etc. but it's still all one release, each deployment, with CRDs to manage the embedded pieces.

haorenfsa commented 3 months ago

@mxchinegod Thank you for your advices.

The milvus components(the deployment named as xxxx-milvus-{component name} are all managed directly through the k8s layer.

We use helm chart to install the dependencies because there's no complicated logics. All these charts are already there to use. It saves us time to code the reconciler. Yet, it's true that the dependencies' releases can be confusing. There's not much design about the releases. we could refine the detail.

And about whether to delete the dependencies & PVCs. PVCs are by default leaved behind when using StatefulSets in Kubernetes. We're following the default behavior so that most users can easily adapt. You can make the operator to delete them as I stated above. We're considering to simplify the configuration though.

If you just want one helm release, for now you can try out our helm chart https://github.com/zilliztech/milvus-helm, which uses just the same paradigm as https://github.com/pegasystems/pega-helm-charts.

It's really appreciated that you tell us your opinion openly. Hope to hear more from you, thank you again.