Open giannoul opened 3 years ago
@giannoul , thank you for the contribution!
Is there a way to make the tokens ephemeral, requested on-demand, and expire. Right now, it means that the tokens will be created when the app is created and the SA/token remains there until app/namespace is deleted. Can we create a route that will generate console-token for let's say 20-30 min sessions.
Is it possible to set expiration on the JWT token using k8s tricks like:
expirationSeconds: 3600 #expires in 60 mins
This would mean hephy workflow cli requests a new token each time or controller uses a sort of cache for tokens. However, this may not be possible at this time.
The other way that this can be done is to store the service_account_name
, service_account_creation_timestamp
, service_account_expiration_timestamp
in the django app's db model. Once timestamp reaches service_account_expiration_timestamp
we can consider this SA token invalid and perform a periodic cronjob in the controller to delete the tokens from the api. Then the on_delete
method on the django model will also delete the SA from the app's namespace by sending a call via k8s client. SA will only be created when a CLI call wants to create console session. If token exists, return it. If not, create it. If expired, recreate the SA.
@Cryptophobia Investigating the token expiration option I found the following:
As we may see from the implementation, all users will use the same underline Service Account in order to get a console access to the container within the pod. Since the console access is actually a back and forth using websockets, the Service Account token is just used in order to initialize the connection. Experimenting with that I saw that for an existing console session, even if I deleted the token, the connection was left intact. This means no disruptions for the existing console connections when/if we remove the Service Account token (secret).
If we delete the token (basically it is a k8s secret) it will get regenerated automatically. This seems to be the default function of the token-controller. During my testing I found that deleting the token via kubectl
will just lead to a new token creation with a different value and name suffix.
Since:
creationTimestamp
stating when it was created we can regenerate the token upon a console session request and return the new token. This basically means that upon a new console request we check if the existing token is older than e.g. 20mins and delete it (the k8s token-controller
will create a new one) and return the new token in order to be used. The connections using the old token will continue to operate and the new ones will use the new token.
The above approach is actually invalidation on demand and not cron job based action, but it has a lot less moving parts.
Okay @giannoul, the above seems like a good solution for ephemeral tokens. Does that mean that the session created by an old token can continue to be open forever? We should also create a mechanism to break the session every 20 mins to retry with same token as previously given when creating the session. This ensures that a long-running connection cannot use a very old token.
This is similar to how SSH long-running open connections are closed via timeouts.
If the above works then we should go with this new approach and add a timeout to make sure a long-running connection is not a security risk.
The sessions opened by any token may remain open (idle) for as long as the kubelet parameter --streaming-connection-idle-timeout
says:
The above means that a session can be idle for a maximum of ~4 hours by default.
The other end that could terminate the connection would be the gorilla websocket
but it does not set such a timeout by default:
The above mean that the idle connections should be terminated after the time instructed by kubelet's --streaming-connection-idle-timeout
.
In general the session itself is controlled via the Kubernetes and the workflow-cli via the gorilla websockets. This means that hephy controller is not able to actually terminate anything. It is just the intermediate that passes the token to the workflow-cli.
So, the only way to avoid the security risk you mentioned would be to set a hard timeout on the workflow-cli, but I am afraid that the first thing that will be requested afterwards would be "how to increase the timeout" 🤣.
So, the only way to avoid the security risk you mentioned would be to set a hard timeout on the workflow-cli, but I am afraid that the first thing that will be requested afterwards would be "how to increase the timeout"
Adding the hard timeout on the workflow-cli client is easily bypassed and is no security at all... so there would be no point to do it for security anyways.
Is there any other way we can terminate the websocket on a set timeout using controller's permissions? For example, a simple solution like set an ENV var for WEBSOCKET_TIMEOUT
on controller. It would would set that value when the workflow-cli client sets the Keep-Alive for the websocket connection. Then immediately after the token is returned it will recreate it on the controller side. If a get for a token comes in, wait 20 seconds then recreate the token immediately. That way the session is guaranteed to timeout and the token is no longer valid after establishing this session, a token is only valid per single session.
This will ensure if a user is deleted, their access is gone as soon as session expires. Still not perfect, but the token will be recreated each time.
@giannoul , any update on this? I would like to get out a new minor release of hephy soon. If you are still working on this one we can get it out for next release.
I didn't get the chance to investigate your suggestion due to a very busy program these days. Please proceed to the minor release without this one.
Okay, thank you for getting back so soon. I can help push this forward for the subsequent release after next.
The Header Keep-Alive cannot be set for the websocket connection. In order to create the functionality we discussed I did the following:
CONTAINER_CONSOLE_WEBSOCKET_TIMEOUT
which is the timeout in seconds. This parameter is sent to the workflow-cli along with the token, when a user requests access to a pod.ResponseWithCallback
class that sends the requested body and then executes a callback function. That function in our case is the re-creation of the token. This means that immediately after receiving the token, it gets re-created.Awesome work @giannoul ! Thank you for working with me on a bit more secure design. Will get this in right after the patch release v2.23.1 coming up.
This PR adds the Service Account token needed for
deis ps:console
(seefeat(pkg/console): support interactive console
PR on workflow-cli )The feature itself consists of the current PR and the
feat(pkg/console): support interactive console
PR on workflow-cli counterpart. The idea is to mimic the mechanism that kubectl is using but with using the k8s API itself. Specifically we can get a websocket access to a console in a pod but we need a token. In order to be able to use the token we need to:deis:deis-console
that will actually have thepods/exec
permissiondeis:deis-controller
in order to be able to attach thedeis:deis-console
clusterrole to a Service Account that belongs to a single namespaceThe above ensures that:
The requirements are to set the variables for the
K8S_API_ENDPOINT
andCONTAINER_CONSOLE_ENABLED
.I tested it on my minikube with setting:
An application named
testadminapp3
had a service account and token attached that was able to give me pod/exec access: