Investigate restoring UID range of source cluster's namespace to destination cluster and potential conflicts.

jwmatthews commented 5 years ago

When we migrate stateful applications from a source cluster to a destination cluster we need to be aware of the UIDs used by the applications on source side and preserve those UIDs on destination side to ensure that file system permissions line up.

For example, on the source side assume that a namespace has the annotations:

$ oc get namespace mssql-persistent -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
     .....
    openshift.io/sa.scc.mcs: s0:c23,c22
    openshift.io/sa.scc.supplemental-groups: 1000550000/10000
    openshift.io/sa.scc.uid-range: 1000550000/10000
    .......

When pods run in that namespace we assume the UID being used will be lower end of the openshift.io/sa.scc.uid-range: 1000550000/10000

We will migrate the contents of this namespace (all k8s resources and associated data in persistent volumes) to a destination cluster.

We are currently assuming that we migrate into a new namespace we create and we supply the annotations we want to use, so we would ensure we set the annotations of

apiVersion: v1
kind: Namespace
metadata:
  annotations:
     .....
    openshift.io/sa.scc.mcs: s0:c23,c22
    openshift.io/sa.scc.supplemental-groups: 1000550000/10000
    openshift.io/sa.scc.uid-range: 1000550000/10000
    .......

Desire is for the pods when they run on destination side and access their data in PVs (which was migrated from source) the file system permissions will line up.

Question is, do we need to be concerned that another namespace on the cluster may already be using that uid-range 1000550000/10000?

Is there a hard requirement that namespaces in OpenShift must use unique uid-ranges?

In the case that some other namespace exists on the destination cluster and was already using something that overlapped/contained or contained 1000550000/10000 would this result in a problem?

sreber84 commented 5 years ago

The purpose of assigning each project a distinct range of user IDs is so that in a multitenant environment, applications from different projects never run as the same user ID. When using persistent storage, any files created by applications will also have different ownership in the file system. Running processes for applications as different user IDs means that if a security vulnerability were ever discovered in the underlying container runtime, and an application were able to break out of the container to the host, they would not be able to interact with processes owned by other users, or from other applications, in other projects.

sseago commented 5 years ago

So as I understand how this works now (and the work Dylan has done to fix prior bugs) we already set the UID range in the new namespace to match the UID on src and make sure the SAs use the same UID as they used on src. If we don't do this, then applications in the dest cluster won't have access to data in the PVs that we copy over.

It sounds like there are two basic concerns here: 1) What if the namespace already exists and the accounts have a different UID range 2) What if the namespace doesn't already exist, but the UIDs being imported already exist in a different namespace.

If, for a given cluster, openshift/kubernetes are making sure that each new namespace has a unique UID range, it's still possible that we'll have clashes between a UID range being imported from another cluster and a UID range allocated directly in the dest cluster.

jwmatthews commented 5 years ago

Related BZ to investigate post 4.2 release: https://bugzilla.redhat.com/show_bug.cgi?id=1748531

migtools / openshift-migration-plugin

Investigate restoring UID range of source cluster's namespace to destination cluster and potential conflicts. #20