utkuozdemir / pv-migrate

CLI tool to easily migrate Kubernetes persistent volumes
Apache License 2.0
1.57k stars 76 forks source link

Add ssh keepalive configuration #248

Closed alex-vmw closed 11 months ago

alex-vmw commented 11 months ago

Is your feature request related to a problem? Please describe. I was testing a migration of 500Gi PVC today with ~23 million files/dirs in it and noticed an issue. Because pv-migrate is using --info=flist0 command line parameter with rsync the destination side is completely idle while source side is preparing the file list. With ~23 million files it took ~2 hours to compile the file list on the source side, but unfortunately ssh connection would always timeout around 40 minutes and rsync process would fail. I think if ssh keepalive was implemented either on the server side or the client side, it would continue to keep the idle connection open as long as both server and client are still alive and responding to keepalive packets.

Describe the solution you'd like I think we have two choices below for implementing ssh keepalive, so what do you like best? I think I personally prefer option 1.

  1. Server Side. We can add the below configuration to the /etc/ssh/sshd_config of the pv-migrate-sshd image:

    ClientAliveInterval 300
    ClientAliveCountMax 3
  2. Client Side. We can add 2 more args to the sshArgs like below:

    sshArgs := []string{
        "ssh", "-o", "StrictHostKeyChecking=no", "-o", "UserKnownHostsFile=/dev/null",
        "-o", "ConnectTimeout=5", "-o", "ServerAliveInterval=300",
                 "-o", "ServerAliveCountMax=3",
    }

Describe alternatives you've considered None

Additional context None

utkuozdemir commented 11 months ago

Server-side variant is already usable without any changes, you just need to extend the sshd image to configure it the way you want and use that image instead, see here.

You can give it a try to see if it solves the problem. If it does, we can also add the config as default here. (Any chance they cause problems in some other scenarios?)

For the client-side solution, making something more generic might be a good idea - we could add a flag like --ssh-extra-args, kinda similarly to the Helm chart value rsync.extraArgs.

alex-vmw commented 11 months ago

Hi @utkuozdemir,

  1. Yes, I already created our own sssd image with the above changes and tested it and everything worked as expected (migrations where PVC has a lot of files/dirs no longer are timed out by sshd).
  2. I do not see any downside for including the server side solution into the official sshd image as this simple keepalive will not cause any other issues. Do you want me to submit a pull request for this (I would be happy to.)?
  3. While I also tested the client-side solution as well and it worked, I believe the server side solution would be much easier and better to implement.
utkuozdemir commented 11 months ago

Ok, let's go with the server side option then 🙂 I'd very much appreciate a PR 🙂

utkuozdemir commented 11 months ago

Closed by https://github.com/utkuozdemir/pv-migrate/commit/4051318002c2825acd9a148b34eb339f33c9a9c8

Thanks for the contribution 🙂.