nix-community / nixos-anywhere-examples

Example flake for nixos-anywhere
https://nix-community.github.io/nixos-anywhere/ [maintainer=@Mic92]
33 stars 13 forks source link

DigitalOcean: Cannot ssh into the droplet #5

Closed eureka-cpu closed 4 months ago

eureka-cpu commented 4 months ago

Ref #4

Using the digital ocean example, I'm unable to SSH into the DO box after the installation succeeds. I've checked and tried a few trouble shooting steps to no avail.

Steps to repro:

  1. create droplet (2 GB Memory / 50 GB Disk / SFO3 - Ubuntu 24.04 (LTS) x64)
  2. ssh root@ipv4-address (succeeds) then exit
  3. nix run github:numtide/nixos-anywhere -- --flake .#digitalocean root@ip-address (succeeds)
  4. ssh root@ipv4-address (hangs indefinitely)

mtr report shows this output for packet loss:

$ mtr 128.xxx.x.49 --report                                                                                                 
Start: 2024-05-31T17:44:33-0700
HOST: critter-tank                Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- _gateway                   0.0%    10    0.6   0.6   0.6   0.7   0.0
  2.|-- tukw-dsl-gw68.tukw.qwest. 90.0%    10    3.0   3.0   3.0   3.0   0.0
  3.|-- tukw-agw1.inet.qwest.net  90.0%    10    5.1   5.1   5.1   5.1   0.0
  4.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  5.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  6.|-- 4.7.18.10                 90.0%    10   20.3  20.3  20.3  20.3   0.0
  7.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0

pinging the server just hangs indefinitely.

eureka-cpu commented 4 months ago

Extra log output of ssh

$ ssh -vvvvv root@143.xxx.xxx.133                                                                                      
OpenSSH_9.6p1, OpenSSL 3.0.13 30 Jan 2024
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 5: Applying options for *
debug2: resolve_canonicalize: hostname 143.xxx.xxx.133 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/home/eureka/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/home/eureka/.ssh/known_hosts2'
debug3: channel_clear_timeouts: clearing
debug3: ssh_connect_direct: entering
debug1: Connecting to 143.xxx.xxx.133 [143.xxx.xxx.133] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x48
debug1: connect to address 143.xxx.xxx.133 port 22: Connection timed out
ssh: connect to host 143.xxx.xxx.133 port 22: Connection timed out
FAIL

Since it timed out I ran systemctl status ssh and got this:

Unit ssh.service could not be found

This is with services.openssh.enable = true which confuses me.

sudo netstat -tuln | grep :22

Shows that port 22 for ipv4 and ipv6 is listening for incoming connections

eureka-cpu commented 4 months ago

@Lassulus @Mic92 if anyone is available to help I would greatly appreciate it as networking stuff is a bit out of my realm

Mic92 commented 4 months ago

In srvos we have a profile for digitalocean. That one used to work for me some months ago. I think we might have some pull request either there or in nixpkgs about the digitalocean and more context about networking.

eureka-cpu commented 4 months ago

This is the only thing I could find in nixpkgs that seems relevant but I'm not sure the correct way to use it: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/virtualisation/digital-ocean-image.nix

I had previously tried adding (modulesPath + "/virtualisation/digital-ocean-config.nix"), but it doesn't seem to have made a difference.

eureka-cpu commented 4 months ago

I'll have a closer look at srvos in the morning, if I can just get this to work with SSH 99% of what I'm trying to do will be solved lol

eureka-cpu commented 4 months ago

So from what I can tell, Digital Ocean has some expectation of the operating system based on the image the original server is built from, and that expectation is probably leading to some some conflict with sshd. This is just speculation, since the same image works fine if it's selected as the original image to build the server with. To test the theory I could also try using nixos-anywhere to flash the image to an already existing NixOS server and see if it works as expected.

Mic92 commented 4 months ago

I know of at least two people in the matrix channel, that got digitalocean working as well.

Mic92 commented 4 months ago

In srvos we are also enabling cloud-init. Maybe this is what is missing?

eyJhb commented 4 months ago

That is indeed what is missing. I did some basic tests, and managed to get it to work using the following configuration.

      # tested with 2GB/2CPU droplet, 1GB droplets do not have enough RAM for kexec
      nixosConfigurations.digitalocean = nixpkgs.lib.nixosSystem {
        system = "x86_64-linux";
        modules = [
          disko.nixosModules.disko
          { disko.devices.disk.disk1.device = "/dev/vda"; }
          {
            networking.useDHCP = nixpkgs.lib.mkForce false;
            services.cloud-init = {
              enable = true;
              network.enable = true;

              # not strictly needed, just for good measure 
              datasource_list = [ "DigitalOcean" ];
              datasource.DigitalOcean = { };
            };
          }
          ./configuration.nix
        ];
      };

Disabling DHCP is very important, as otherwise it will pick up garbage networking config.