Support Etcd per User Namespace for ModelMesh

lizzzcai commented 2 years ago

Is your feature request related to a problem? If so, please describe.

In our use case, we manage the tenancy per namespace, so I am trying to set up etcd per namespace so that if the etcd instance is having an issue, it will not affect other users/namespace (limit the blast radius) and ensure the data security.

However, when I tried to provide a model-serving-etcd in user namespace, it is not working and the data is overwritten to point to the global etcd in the controller namespace.

example:

❯ k apply -f model-serving-etcd.yaml -n mms-demo-1
secret/model-serving-etcd configured
❯ k get secret/model-serving-etcd -n mms-demo-1 -o json | jq '.data | map_values(@base64d)'
{
  "etcd_connection": "{\n  \"endpoints\": \"http://etcd.mms-demo-1:2379\",\n  \"root_prefix\": \"modelmesh-serving/mm_ns/mms-demo-1\"\n}\n"
}
❯ k apply -f servingruntime_custom.yaml -n mms-demo-1
servingruntime.serving.kserve.io/custom-mlserver-1.x created
❯ k apply -f custom_predictor.yaml -n mms-demo-1
predictor.serving.kserve.io/custom-predictor created
❯ k get po -n mms-demo-1
NAME                                                    READY   STATUS    RESTARTS   AGE
etcd-c55788c9-2hgth                                     1/1     Running   0          3m15s
modelmesh-serving-custom-mlserver-1.x-b8c55f898-cp6hs   4/4     Running   0          9s

# endpoints are overwritten
❯ k get secret/model-serving-etcd -n mms-demo-1 -o json | jq '.data | map_values(@base64d)'
{
  "etcd_connection": "{\"endpoints\":\"http://etcd.modelmesh-serving:2379\",\"root_prefix\":\"modelmesh-serving/mm_ns/mms-demo-1\"}"
}

Not sure if I have do it in the correct way, but I saw the code has logic to point to the etcd in controller namespace.

Describe your proposed solution

Allow overwriting model-serving-etcd in the user namespace to support Etcd per namespace.

Describe alternatives you have considered

Additional context

njhill commented 2 years ago

@lizzzcai though it is now the default, cluster scope operation should be considered relatively alpha and still needs a bit more work. In particular w.r.t. how the secrets are handled as you've observed.

Security-wise it would be nice (and relatively straightforward) to have the controller namespace secret be the only one with "root" credentials and to automatically add a user id per namespace so that the metadata used by each is partitioned and not accessible using the creds in a secret from a different (non-controller) namespace.

The ability to disable the controller's synchronization of the etcd secret to other namespaces (so that they could each be created manually) is also something that could be done, but it would require the controller to keep track of all these different secrets so that it can connect to/watch different etcds.

For now, the best approach might be to use namespace scope and have a controller per namespace. The controller pod itself should have pretty minimal overhead from a resource footprint pov.

lizzzcai commented 2 years ago

Hi @njhill, thanks for your reply.

The ability to disable the controller's synchronization of the etcd secret to other namespaces (so that they could each be created manually) is also something that could be done, but it would require the controller to keep track of all these different secrets so that it can connect to/watch different etcds.

I see, the controller keeps watching the etcd by design. BTW are there any design docs or flow chart to provide more details about the how the e2e flow is working in modelmesh? (similar like this)

For now, the best approach might be to use namespace scope and have a controller per namespace. The controller pod itself should have pretty minimal overhead from a resource footprint pov.

For namespace scope, is it possible to monitor multiple namespaces via selector? (to reduce the risk that one controller / etcd down affects all the users) One controller + one etcd instance per namespace is a bit too much for us as we may have more than a few hundred namespaces, that total cost will be a bit higher here for us.

njhill commented 2 years ago

I see, the controller keeps watching the etcd by design.

Yes, this is read-only however, just used to trigger a predictor reconciliation when things change. Otherwise, the etcd data is internal to / owned by the modelmesh containers.

are there any design docs or flow chart to provide more details about the how the e2e flow is working in modelmesh? (similar like this)

I don't think there's anything published externally currently, but I will see what I can find and add to external docs.

For namespace scope, is it possible to monitor multiple namespaces via selector? (to reduce the risk that one controller / etcd down affects all the users)

No, namespace scope is everything confined to a single namespace controller + controlled. In cluster scope mode the controller will only manage other namespaces which have a particular annotation so you can be specific about which. But we didn't envisage or intend to support a case where there's more than one cluster-scope install in a given cluster.

One controller + one etcd instance per namespace is a bit too much for us as we may have more than a few hundred namespaces, that total cost will be a bit higher here for us.

I thought the original question was how to have a separate etcd per namespace? My suggestion is to have a single shared etcd but with users/access controls configured so that the access secret in each namespace can't be used to access metadata for other namespaces. Then you can do a namespace-scope install in each namespace so it's only the controller that's duplicated. You can probably squeeze down its resource allocations quite a bit.

kserve / modelmesh-serving

Support Etcd per User Namespace for ModelMesh #248