Closed angely-dev closed 1 year ago
Hi @angely-dev
Sorry, this is something on our end, it seems that it is related to the ssh keys handling.
To workaround this, can you create any private key for your user?
You can do that with ssh-keygen
command
@angely-dev I have added more debug statements for this private build, can you download it with
docker run --rm -v $(pwd):/workspace ghcr.io/oras-project/oras:v0.12.0 pull ghcr.io/srl-labs/clab-oci:be5aeac0
this command will put a new containerlab binary in your current working dir
if you run it like ./containerlab deploy --cleanup
you should see WARN log messages with more debug info
Thanks for the quick answer @hellt and for your work. Please let me some time to try what you asked for and to gather the logs.
Here is the trace with the new binary (I wasn't able to --cleanup
):
$ sudo ./containerlab deploy --cleanup
Error: unknown flag: --cleanup
$ sudo ./containerlab --cleanup deploy
Error: unknown flag: --cleanup
$ sudo ./containerlab --log-level trace deploy
DEBU[0000] trying to find topology files automatically
DEBU[0000] topology file found: simple.clab.yml
INFO[0000] Containerlab v0.0.0 started
DEBU[0000] template variables: <nil>
DEBU[0000] topology:
name: simple
topology:
nodes:
srl:
kind: nokia_srlinux
image: ghcr.io/nokia/srlinux
DEBU[0000] method initMgmtNetwork was called mgmt params &{Network: Bridge: IPv4Subnet: IPv4Gw: IPv4Range: IPv6Subnet: IPv6Gw: IPv6Range: MTU: ExternalAccess:<nil>}
DEBU[0000] New mgmt params are &{Network:clab Bridge: IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv4Range: IPv6Subnet:2001:172:20:20::/64 IPv6Gw: IPv6Range: MTU: ExternalAccess:0xc0005a6eef}
DEBU[0000] env runtime var value is
DEBU[0000] Running runtime.Init with params &{Timeout:2m0s GracefulShutdown:false Debug:false KeepMgmtNet:false} and &{Network:clab Bridge: IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv4Range: IPv6Subnet:2001:172:20:20::/64 IPv6Gw: IPv6Range: MTU: ExternalAccess:0xc0005a6eef}
DEBU[0000] Runtime: Docker
DEBU[0000] detected docker network mtu value - 1500
DEBU[0000] initialized a runtime with params &{config:{Timeout:120000000000 GracefulShutdown:false Debug:false KeepMgmtNet:false} Client:0xc000b31b80 mgmt:0xc00086b040}
INFO[0000] Parsing & checking topology file: simple.clab.yml
DEBU[0000] node config: &{ShortName:srl LongName:clab-simple-srl Fqdn:srl.simple.io LabDir:/home/USERNAME/workspace/containerlab/simple/clab-simple/srl Index:0 Group: Kind:nokia_srlinux StartupConfig: StartupDelay:0 EnforceStartupConfig:false AutoRemove:0xc0006f37fa ResStartupConfig: Config:<nil> ResConfig: NodeType: Position: License: Image:ghcr.io/nokia/srlinux ImagePullPolicy:IfNotPresent Sysctls:map[] User: Entrypoint: Cmd: Exec:[] Env:map[] Binds:[] PortBindings:map[] ResultingPortBindings:[] PortSet:map[] NetworkMode: MgmtNet: MgmtIntf: MgmtIPv4Address: MgmtIPv4PrefixLength:0 MgmtIPv6Address: MgmtIPv6PrefixLength:0 MgmtIPv4Gateway: MgmtIPv6Gateway: MacAddress: ContainerID: TLSCert: TLSKey: TLSAnchor: Certificate:<nil> NSPath: Publish:[] ExtraHosts:[] Labels:map[] Endpoints:[] SANs:[] Sandbox: Kernel: Runtime: CPU:0 CPUSet: Memory: Extras:<nil> WaitFor:[] DNS:<nil> IsRootNamespaceBased:false}
DEBU[0000] lab Conf: &{Name:simple Prefix:0xc00061c150 Mgmt:0xc00086b040 Topology:0xc000920c60 Debug:false}
DEBU[0000] Env: CLAB_VERSION_CHECK=
DEBU[0000] kernel version: 5.19.0-45-generic
DEBU[0000] Looking up ghcr.io/nokia/srlinux Docker image
DEBU[0000] Image ghcr.io/nokia/srlinux present, skip pulling
DEBU[0000] kernel module "ip_tables" is already loaded
DEBU[0000] kernel module "ip6_tables" is already loaded
INFO[0000] Creating lab directory: /home/USERNAME/workspace/containerlab/simple/clab-simple
DEBU[0000] failed loading csr /home/USERNAME/workspace/containerlab/simple/clab-simple/.tls/ca/ca.csr, continuing anyways
WARN[0000] first [/root]
WARN[0000] second [/root]
DEBU[0000] SSH_AUTH_SOCK not set, skipping pubkey fetching
DEBU[0000] extracted 0 keys from ssh-agent
WARN[0000] third [/root]
Error: failed reading the file /root: read /root: is a directory
Then I generate SSH keys in /root
this way (I cannot login as root directly):
$ sudo mkdir /root/.ssh
$ sudo chmod 700 /root/.ssh
$ sudo ssh-keygen -f /root/.ssh/id_rsa
$ sudo ls -al /root | grep .ssh
drwx------ 2 root root 4096 juin 26 14:19 .ssh
$ sudo ls -l /root/.ssh
total 8
-rw------- 1 root root 2602 juin 26 14:19 id_rsa
-rw-r--r-- 1 root root 571 juin 26 14:19 id_rsa.pub
And unfortunately I got the same error if I rerun the deploy. Anything I missed?
Unfortunately I can't reproduce this, so likely you'll have to collect more logs
download another binary: docker run --rm -v $(pwd):/workspace ghcr.io/oras-project/oras:v0.12.0 pull ghcr.io/srl-labs/clab-oci:be5aeac0
then do
whoami
then
ls -la ~/.ssh
then run ./containerlab deploy -c
Here it is:
$ whoami
angely
$ ls -la ~/.ssh
total 264
drwx------ 2 angely domain users 4096 juin 21 15:38 .
drwxr-xr-x 28 angely domain users 4096 juin 26 16:47 ..
-rw-r--r-- 1 angely domain users 578 févr. 6 15:30 config
-rw------- 1 angely domain users 1843 janv. 10 2020 id_rsa
-rw-r--r-- 1 angely domain users 414 janv. 10 2020 id_rsa.pub
-rw------- 1 angely domain users 123526 juin 20 09:33 known_hosts
-rw------- 1 angely domain users 120882 mars 29 14:48 known_hosts.old
$ sudo ls -la /root/.ssh
total 16
drwx------ 2 root root 4096 juin 26 14:19 .
drwx------ 12 root root 4096 juin 26 14:19 ..
-rw------- 1 root root 2602 juin 26 14:19 id_rsa
-rw-r--r-- 1 root root 571 juin 26 14:19 id_rsa.pub
$ ./containerlab deploy -c
Error: containerlab requires sudo privileges to run
$ sudo ./containerlab deploy -c
INFO[0000] Containerlab v0.0.0 started
INFO[0000] Parsing & checking topology file: simple.clab.yml
INFO[0000] Removing /home/angely/workspace/containerlab/simple/clab-simple directory...
INFO[0000] Creating lab directory: /home/angely/workspace/containerlab/simple/clab-simple
WARN[0000] resolved path: /root
WARN[0000] first [/root]
WARN[0000] second [/root]
WARN[0000] third [/root]
Error: failed reading the file /root: read /root: is a directory
I'm not sure it says more. Thanks again for building custom binaries for this issue.
thanks @angely-dev
Apparently on your system the code fails to lookup the user (angely
) using its ID and the fallback procedure was not in place.
Now I am not sure why on your system the user was not fetched using its ID, some linuxy stuff.
I made a build for you that fixes that
docker run --rm -v $(pwd):/workspace ghcr.io/oras-project/oras:v0.12.0 pull ghcr.io/srl-labs/clab-oci:be5aeac0
If you're eager to dig deeper, here is a procedure we used to get the sudo user
# identify SUDO_UID
sudo env | grep SUDO_UID
SUDO_UID=0
# in your case the SUDO_UID is either unset or set to some non-0 value
# if it is set to non-0, try fetching the user name by ID
id -nu 0
root
The log is better indeed:
$ sudo ./containerlab deploy -c
INFO[0000] Containerlab v0.0.0 started
INFO[0000] Parsing & checking topology file: simple.clab.yml
INFO[0000] Removing /home/angely/workspace/containerlab/simple/clab-simple directory...
INFO[0000] Creating lab directory: /home/angely/workspace/containerlab/simple/clab-simple
INFO[0000] Creating docker network: Name="clab", IPv4Subnet="172.20.20.0/24", IPv6Subnet="2001:172:20:20::/64", MTU="1500"
Error: Error response from daemon: Pool overlaps with other one on this address space
I did not look at the overlapping error yet, not related I presume, but there is no more the root error! Many thanks.
Yes, the SUDO_UID
variable is non-0:
$ sudo env | grep SUDO_UID
SUDO_UID=1876203189
$ SUDO_UID=0
$ sudo env | grep SUDO_UID
SUDO_UID=1876203189
$ id -nu 0
root
$ id -nu 1876203189
angely
Apparently on your system the code fails to lookup the user (angely) using its ID
I'm no system expert but I'm using an AD account, maybe it has to do with this?
$ id
uid=1876203189(angely) gid=1876200513(domain users) groups=1876200513(domain users),27(sudo),137(wireshark),139(ubridge),141(libvirt),998(docker) # and so on
Yes, this is definitely the case here.
I don't have a system with AD to check this myself, but all points out to it.
I think we have to add another way to find the user by its ID in case Go stdlib fails to do so. We can use id
command that seems to work just fine in both cases.
The IP pool clash is something you have to check, as it is related to your docker settings.
You can do docker network ls
and see which networks you have defined, one of them appears to be using the network clab chose - 172.20.20.0/24 (or its IPv6 pair)
@angely-dev can you try this build which has enhanced version of homedir retrieval?
docker run --rm -v $(pwd):/workspace ghcr.io/oras-project/oras:v0.12.0 pull ghcr.io/srl-labs/clab-oci:6232ac42
run it with --debug
flag so that I can check the debug messages I added
Thanks @hellt. Please allow me some time to test this.
So I cleaned up docker networks and reran the bin you just built. Here is the debug:
$ sudo ./containerlab deploy -c -d
DEBU[0000] trying to find topology files automatically
DEBU[0000] topology file found: simple.clab.yml
INFO[0000] Containerlab v0.0.0 started
DEBU[0000] template variables: <nil>
DEBU[0000] topology:
name: simple
topology:
nodes:
srl:
kind: nokia_srlinux
image: ghcr.io/nokia/srlinux
DEBU[0000] method initMgmtNetwork was called mgmt params &{Network: Bridge: IPv4Subnet: IPv4Gw: IPv4Range: IPv6Subnet: IPv6Gw: IPv6Range: MTU: ExternalAccess:<nil>}
DEBU[0000] New mgmt params are &{Network:clab Bridge: IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv4Range: IPv6Subnet:2001:172:20:20::/64 IPv6Gw: IPv6Range: MTU: ExternalAccess:0xc0009e822a}
DEBU[0000] env runtime var value is
DEBU[0000] Running runtime.Init with params &{Timeout:2m0s GracefulShutdown:false Debug:false KeepMgmtNet:false} and &{Network:clab Bridge: IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv4Range: IPv6Subnet:2001:172:20:20::/64 IPv6Gw: IPv6Range: MTU: ExternalAccess:0xc0009e822a}
DEBU[0000] Runtime: Docker
DEBU[0000] detected docker network mtu value - 1500
DEBU[0000] initialized a runtime with params &{config:{Timeout:120000000000 GracefulShutdown:false Debug:false KeepMgmtNet:false} Client:0xc000976080 mgmt:0xc000446000}
INFO[0000] Parsing & checking topology file: simple.clab.yml
DEBU[0000] node config: &{ShortName:srl LongName:clab-simple-srl Fqdn:srl.simple.io LabDir:/home/angely/workspace/containerlab/simple/clab-simple/srl Index:0 Group: Kind:nokia_srlinux StartupConfig: StartupDelay:0 EnforceStartupConfig:false AutoRemove:0xc000c26f3a ResStartupConfig: Config:<nil> ResConfig: NodeType: Position: License: Image:ghcr.io/nokia/srlinux ImagePullPolicy:IfNotPresent Sysctls:map[] User: Entrypoint: Cmd: Exec:[] Env:map[] Binds:[] PortBindings:map[] ResultingPortBindings:[] PortSet:map[] NetworkMode: MgmtNet: MgmtIntf: MgmtIPv4Address: MgmtIPv4PrefixLength:0 MgmtIPv6Address: MgmtIPv6PrefixLength:0 MgmtIPv4Gateway: MgmtIPv6Gateway: MacAddress: ContainerID: TLSCert: TLSKey: TLSAnchor: Certificate:<nil> NSPath: Publish:[] ExtraHosts:[] Labels:map[] Endpoints:[] SANs:[] Sandbox: Kernel: Runtime: CPU:0 CPUSet: Memory: Extras:<nil> WaitFor:[] DNS:<nil> IsRootNamespaceBased:false}
DEBU[0000] lab Conf: &{Name:simple Prefix:0xc0009c15f0 Mgmt:0xc000446000 Topology:0xc0006a2120 Debug:false}
DEBU[0000] Env: CLAB_VERSION_CHECK=
DEBU[0000] Filter key: name, filter value: ^clab-simple-srl$
INFO[0000] Removing /home/angely/workspace/containerlab/simple/clab-simple directory...
DEBU[0000] kernel version: 5.19.0-45-generic
DEBU[0000] Looking up ghcr.io/nokia/srlinux Docker image
INFO[0000] Could not read docker config: open /root/.docker/config.json: no such file or directory
DEBU[0000] docker config file not found
INFO[0000] Pulling ghcr.io/nokia/srlinux:latest Docker image
DEBU[0000] latest version 0.42.0 is newer than the current one 0.0.0
INFO[0161] Done pulling ghcr.io/nokia/srlinux:latest
DEBU[0161] kernel module "ip_tables" is already loaded
DEBU[0161] kernel module "ip6_tables" is already loaded
INFO[0161] Creating lab directory: /home/angely/workspace/containerlab/simple/clab-simple
DEBU[0161] writing cert file to /home/angely/workspace/containerlab/simple/clab-simple/.tls/ca/ca.pem
DEBU[0161] writing key file to /home/angely/workspace/containerlab/simple/clab-simple/.tls/ca/ca.key
DEBU[0161] error while looking up user by id using os/user.LookupId 1876203189: user: unknown userid 1876203189
DEBU[0161] user home dir /home/angely found using getent command
DEBU[0161] error while looking up user by id using os/user.LookupId 1876203189: user: unknown userid 1876203189
DEBU[0161] user home dir /home/angely found using getent command
DEBU[0161] SSH_AUTH_SOCK not set, skipping pubkey fetching
DEBU[0161] extracted 0 keys from ssh-agent
Error: failed reading the file /home/angely: read /home/angely: is a directory
thanks @angely-dev it works it breaks in an expected place. I will have a proper fix in a few moments Thanks for staying on it 👍
@angely-dev this should do it
docker run --rm -v $(pwd):/workspace ghcr.io/oras-project/oras:v0.12.0 pull ghcr.io/srl-labs/clab-oci:ac908442
It works indeed!
$ sudo ./containerlab deploy -c -d
(...)
Run 'containerlab version upgrade' to upgrade or go check other installation options at https://containerlab.dev/install/
+---+-----------------+--------------+-----------------------+---------------+---------+----------------+----------------------+
| # | Name | Container ID | Image | Kind | State | IPv4 Address | IPv6 Address |
+---+-----------------+--------------+-----------------------+---------------+---------+----------------+----------------------+
| 1 | clab-simple-srl | 7738ef838cd0 | ghcr.io/nokia/srlinux | nokia_srlinux | running | 172.20.20.2/24 | 2001:172:20:20::2/64 |
+---+-----------------+--------------+-----------------------+---------------+---------+----------------+----------------------+
$ docker exec -it clab-simple-srl bash
[root@srl /]# echo Hello, World!
Error: Server is not running
popped up twice in the debug, if that is of importance.
Next step for me is to test with Cisco XRv and with a topology.
Thanks for the work and the availability!
@angely-dev nice. thanks. I will make those fixes part of the next release. Server is not running is an expected log message, normal operation. I am therefore closing this issue
Hello,
So I'm using containerlab for the first time on Ubuntu (22.04.2 LTS) and I'm encountering this issue:
I first tried with a config file and it failed, so I tried with no config file at all and got the same error.
Trace:
I did not find any issue related to this error message. Any clue?
Sorry if it seems obvious or not related to containerlab, I do not know how to dig further into the issue.
Thanks.