nodejs / build

Better build and test infra for Node.
502 stars 165 forks source link

Issue with pulling git from jenkins node #3254

Closed MoLow closed 1 year ago

MoLow commented 1 year ago

https://ci.nodejs.org/job/node-test-pull-request/50585/console https://ci.nodejs.org/job/node-test-pull-request/50584/ https://ci.nodejs.org/job/node-test-pull-request/50583 https://ci.nodejs.org/job/node-test-pull-request/50582 etc

14:37:42 Verifying host key using known hosts file
14:37:42  > git fetch --no-tags --progress -- git@github.com:nodejs/node.git +refs/heads/*:refs/remotes/origin/* +refs/pull/46490/head:refs/remotes/origin/_jenkins_local_branch # timeout=20
14:37:42 ERROR: Error fetching remote repo 'origin'
14:37:42 hudson.plugins.git.GitException: Failed to fetch from git@github.com:nodejs/node.git
14:37:42    at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:1003)
14:37:42    at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1245)
14:37:42    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1309)
14:37:42    at hudson.scm.SCM.checkout(SCM.java:540)
14:37:42    at hudson.model.AbstractProject.checkout(AbstractProject.java:1240)
14:37:42    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
14:37:42    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
14:37:42    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
14:37:42    at com.tikal.jenkins.plugins.multijob.MultiJobBuild$MultiJobRunnerImpl.run(MultiJobBuild.java:148)
14:37:42    at hudson.model.Run.execute(Run.java:1900)
14:37:42    at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:76)
14:37:42    at hudson.model.ResourceController.execute(ResourceController.java:101)
14:37:42    at hudson.model.Executor.run(Executor.java:442)
14:37:42 Caused by: hudson.plugins.git.GitException: Command "git fetch --no-tags --progress -- git@github.com:nodejs/node.git +refs/heads/*:refs/remotes/origin/* +refs/pull/46490/head:refs/remotes/origin/_jenkins_local_branch" returned status code 128:
14:37:42 stdout: 
14:37:42 stderr: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
14:37:42 @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
14:37:42 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
14:37:42 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
14:37:42 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
14:37:42 It is also possible that a host key has just been changed.
14:37:42 The fingerprint for the RSA key sent by the remote host is
14:37:42 SHA256:uNiVztksCsDhcc0u9e8BujQXVUpKZIDTMczCvj3tD2s.
14:37:42 Please contact your system administrator.
14:37:42 Add correct host key in /home/iojs/.ssh/known_hosts to get rid of this message.
14:37:42 Offending RSA key in /home/iojs/.ssh/known_hosts:10
14:37:42   remove with:
14:37:42   ssh-keygen -f "/home/iojs/.ssh/known_hosts" -R "github.com"
14:37:42 RSA host key for github.com has changed and you have requested strict checking.
14:37:42 Host key verification failed.
14:37:42 fatal: Could not read from remote repository.
14:37:42 
14:37:42 Please make sure you have the correct access rights
14:37:42 and the repository exists.
14:37:42 
14:37:42    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2734)
14:37:42    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2111)
14:37:42    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:623)
14:37:42    at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:158)
14:37:42    at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:151)
14:37:42    at hudson.remoting.UserRequest.perform(UserRequest.java:211)
14:37:42    at hudson.remoting.UserRequest.perform(UserRequest.java:54)
14:37:42    at hudson.remoting.Request$2.run(Request.java:376)
14:37:42    at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
14:37:42    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
14:37:42    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
14:37:42    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
14:37:42    at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:126)
14:37:42    at java.base/java.lang.Thread.run(Thread.java:833)
14:37:42    Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from e.1a.7534.ip4.static.sl-reverse.com/52.117.26.14:52242
14:37:42        at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1784)
14:37:42        at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
14:37:42        at hudson.remoting.Channel.call(Channel.java:1000)
14:37:42        at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:143)
14:37:42        at jdk.internal.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)
14:37:42        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
14:37:42        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
14:37:42        at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:129)
14:37:42        at jdk.proxy124/jdk.proxy124.$Proxy206.execute(Unknown Source)
14:37:42        at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:1001)
14:37:42        at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1245)
14:37:42        at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1309)
14:37:42        at hudson.scm.SCM.checkout(SCM.java:540)
14:37:42        at hudson.model.AbstractProject.checkout(AbstractProject.java:1240)
14:37:42        at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:649)
14:37:42        at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:85)
14:37:42        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:521)
14:37:42        at com.tikal.jenkins.plugins.multijob.MultiJobBuild$MultiJobRunnerImpl.run(MultiJobBuild.java:148)
14:37:42        at hudson.model.Run.execute(Run.java:1900)
14:37:42        at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:76)
14:37:42        at hudson.model.ResourceController.execute(ResourceController.java:101)
14:37:42        at hudson.model.Executor.run(Executor.java:442)
14:37:42 ERROR: Error fetching remote repo 'origin'

on [test-ibm-ubuntu1804-x64-1](https://ci.nodejs.org/computer/test-ibm-ubuntu1804-x64-1)

MoLow commented 1 year ago

I was able to ssh into the machine but fetching git failed with fatal: detected dubious ownership in repository I was on my way out so I did not have the time to continue investigating

joaocgreis commented 1 year ago

I believe this was fixed by https://github.com/nodejs/build/pull/3255 . Please reopen if not.

Thanks @MoLow!

joaocgreis commented 1 year ago

I did apply https://github.com/nodejs/build/pull/3255 to everywhere I have access, but I don't think I got all of the workers.

After that, I still see a lot of different but related failures in Jenkins:

12:57:36 stderr: Warning: the ECDSA host key for 'github.com' differs from the key for the IP address '140.82.121.4'
12:57:36 Offending key for IP in /Users/iojs/.ssh/known_hosts:1
12:57:36 Matching host key in /Users/iojs/.ssh/known_hosts:7
12:57:36 Exiting, you have requested strict checking.
12:57:36 Host key verification failed.
joaocgreis commented 1 year ago

I also don't think it was applied to containers, and I don't know what needs to be done for those.

richardlau commented 1 year ago

I did apply #3255 to everywhere I have access, but I don't think I got all of the workers.

After that, I still see a lot of different but related failures in Jenkins:

12:57:36 stderr: Warning: the ECDSA host key for 'github.com' differs from the key for the IP address '140.82.121.4'
12:57:36 Offending key for IP in /Users/iojs/.ssh/known_hosts:1
12:57:36 Matching host key in /Users/iojs/.ssh/known_hosts:7
12:57:36 Exiting, you have requested strict checking.
12:57:36 Host key verification failed.

(This is a drive-by comment, in an unfortunate piece of timing I'm on PTO today and Monday and am not on my work computer which has the ssh keys needed to get into any of the build machines/infra.)

When I added https://github.com/nodejs/build/pull/3212 I was very conservative -- that PR added the keys for GitHub but didn't remove any existing entries. It's complicated by ssh writing new entries into known_hosts with the IP address (in this case 140.82.121.4) which I think makes its very hard to use ansible.builtin.known_hosts to remove the existing entries as the IP address could be anything in the range(s) operated by GitHub: https://api.github.com/meta Maybe it's possible to remove the entires with the deprecated key via ansible.builtin.lineinfile.

(A non-optimal solution would be to delete the known_hosts file and recreate it, but that will mean the playbook wouldn't be idempotent.)

mhdawson commented 1 year ago

@joaocgreis did https://github.com/nodejs/build/pull/3255 resolve the issue on the machines you ran it against. Based on @richardlau's comment I'm not sure if though it might not fix the issue?

EDIT: I guess other than errors that related to specific IPs

mhdawson commented 1 year ago

@richardlau if you happen to check in, if #3255 would have addressed the issue would runinng

ansible-playbook ansible/playbooks/jenkins/docker-host.yaml  --limit "test-digitalocean-ubuntu1804_docker-x64-1" -vv

be expected to fixup the containers on test-digitalocean-ubuntu1804_docker-x64-1

mhdawson commented 1 year ago

The other question to @richardlau and other @nodejs/build members, in terms of public test workers are there any known_hosts that should be there for specifc ips? ie is there any reason we can't just delete all ip specific entries (for example if we need to manually clean up some machines) when we come across them? I don't think so but wanted to see if anybody else knew of any reason.

mhdawson commented 1 year ago

This seems to explain why the update was not made on the test-digitalocean-freebsd12-x64-X machines. I updated the 2 manually.

TASK [jenkins-worker : write github.com entry in known_hosts] **************************************************************************************************************************************************************
fatal: [test-digitalocean-freebsd12-x64-1]: FAILED! => {"msg": "Failed to set permissions on the temporary files Ansible needs to create when becoming an unprivileged user (rc: 1, err: chmod: invalid file mode: A+user:iojs:rx:allow\n}). For information on working around this, see https://docs.ansible.com/ansible-core/2.13/user_guide/become.html#risks-of-becoming-an-unprivileged-user"}
mhdawson commented 1 year ago

In this job https://ci.nodejs.org/job/node-test-commit-linux/nodes=ubuntu1804-64/51177/console it seems to complain about the EDCSA hostkey even though that whas not supposed to have changed.

13:46:46 stderr: Warning: the ECDSA host key for 'github.com' differs from the key for the IP address '140.82.114.3'
13:46:46 Offending key for IP in /home/iojs/.ssh/known_hosts:8
13:46:46 Matching host key in /home/iojs/.ssh/known_hosts:13
13:46:46 Exiting, you have requested strict checking.
13:46:46 Host key verification failed.
13:46:46 fatal: Could not read from remote repository.

disabling test-equinix_mnx-ubuntu1804-x64-1 where it ran to see if job can run on other machines

mhdawson commented 1 year ago

Same problem on [test-digitalocean-ubuntu1804-x64-1](https://ci.nodejs.org/computer/test-digitalocean-ubuntu1804-x64-1) but logging into the machine I don't see any entries that have an IP associated with them.

mhdawson commented 1 year ago

I guess maybe the warning is a red herring as the match is on a different key which although the text does not obviously show it as associated with an IP, it must be.

mhdawson commented 1 year ago

I'm done until Monday, I guess the question is if we should use @richardlau suggestion to remove the known_hosts files and recreate at least temporarily as it seems like we still have large number of machines with a broken config.

richardlau commented 1 year ago

(This is a drive-by comment, in an unfortunate piece of timing I'm on PTO today and Monday and am not on my work computer which has the ssh keys needed to get into any of the build machines/infra.) ... Maybe it's possible to remove the entries with the deprecated key via ansible.builtin.lineinfile.

Untested PR for the above: https://github.com/nodejs/build/pull/3256

richardlau commented 1 year ago

The other question to @richardlau and other @nodejs/build members, in terms of public test workers are there any known_hosts that should be there for specifc ips? ie is there any reason we can't just delete all ip specific entries (for example if we need to manually clean up some machines) when we come across them? I don't think so but wanted to see if anybody else knew of any reason.

I think for the test machines we only need the keys for github.com in known_hosts. The release machines also need the key to upload to the dist server. I think the benchmark machines need the key for the benchmark data machine.

targos commented 1 year ago

I ran https://github.com/nodejs/build/pull/3256 on all possible hosts.

I also manually updated:

test-softlayer-alpine311_container-x64-1
test-softlayer-alpine312_container-x64-1
test-digitalocean-alpine311_container-x64-1
test-digitalocean-alpine312_container-x64-1
test-digitalocean-alpine311_container-x64-2
test-digitalocean-alpine312_container-x64-2
test-equinix_mnx-ubuntu1804-x64-1
test-equinix-ubuntu2004_sharedlibs_container-arm64-1
test-equinix-ubuntu2004_sharedlibs_container-arm64-2
test-equinix-ubuntu2004_sharedlibs_container-arm64-3
test-equinix-ubuntu1804_sharedlibs_container-arm64-1
test-equinix-ubuntu1804_sharedlibs_container-arm64-2
test-equinix-ubuntu1804_sharedlibs_container-arm64-3
test-equinix-ubuntu2004_container-arm64-1
test-equinix-ubuntu1804_container-arm64-1
test-equinix-debian10_container-armv7l-1
test-equinix-ubuntu2004_container-armv7l-1
test-equinix-centos7_container-arm64-1
test-equinix-debian10_container-armv7l-2
test-osuosl-ubuntu2004_sharedlibs_container-arm64-1
test-osuosl-ubuntu1804_sharedlibs_container-arm64-1
test-osuosl-ubuntu1804_container-arm64-1
test-osuosl-debian10_container-armv7l-1
test-osuosl-centos7_container-arm64-1
test-osuosl-ubuntu2004_container-arm64-1
test-osuosl-rhel8_container-arm64-1
test-osuosl-ubuntu2004_container-armv7l-1
targos commented 1 year ago

Another batch of manual updates:

test-digitalocean-ubuntu1804_sharedlibs_container-x64-4
test-digitalocean-ubuntu1804_sharedlibs_container-x64-6
test-digitalocean-ubuntu1804_sharedlibs_container-x64-8
test-digitalocean-ubuntu1804_sharedlibs_container-x64-2
test-digitalocean-rhel8_arm_cross_container-x64-2
test-digitalocean-ubi81_container-x64-2
test-digitalocean-ubuntu1804_sharedlibs_container-x64-
test-digitalocean-ubuntu1804_arm_cross_container-x64-2
test-digitalocean-ubuntu1804_sharedlibs_container-x64-9
test-digitalocean-ubuntu1804_sharedlibs_container-x64-3
test-digitalocean-ubi81_container-x64-1
test-digitalocean-rhel8_arm_cross_container-x64-1
test-digitalocean-ubuntu1804_sharedlibs_container-x64-5
test-digitalocean-ubuntu1804_arm_cross_container-x64-1
test-digitalocean-ubuntu1804_sharedlibs_container-x64-1
test-digitalocean-ubuntu1604_arm_cross_container-x64-1
test-digitalocean-ubuntu1804_sharedlibs_container-x64-7
mhdawson commented 1 year ago

Fixed up test-softlayer-ubuntu1804_sharedlibs_container-x64-4

mhdawson commented 1 year ago

Fixed up test-softlayer-ubuntu1804_sharedlibs_container-x64-1

mhdawson commented 1 year ago

Fixed up test-softlayer-ubuntu1804_sharedlibs_container-x64-3 and test-softlayer-ubuntu1804_sharedlibs_container-x64-5

mhdawson commented 1 year ago

Fixed up test-softlayer-ubuntu1804_sharedlibs_container-x64-2

richardlau commented 1 year ago

@richardlau if you happen to check in, if #3255 would have addressed the issue would runinng

ansible-playbook ansible/playbooks/jenkins/docker-host.yaml  --limit "test-digitalocean-ubuntu1804_docker-x64-1" -vv

be expected to fixup the containers on test-digitalocean-ubuntu1804_docker-x64-1

I missed this yesterday -- no, https://github.com/nodejs/build/pull/3255 won't affect the docker hosts as the docker-host playbook doesn't run the jenkins-worker role. I overlooked the docker containers when I did https://github.com/nodejs/build/pull/3212. It may make sense to extract the github known hosts tasks into its own role which can be called from both playbooks in a similar fashion to the release-builder role, which writes the key for our dist server into known_hosts for the release machines.

(e.g. here is how the docker role calls the release-builder role for each container: https://github.com/nodejs/build/blob/d428b08c38bae3f7fe093d6c1df529570c45268e/ansible/roles/docker/tasks/main.yml#L92-L101)

mhdawson commented 1 year ago

fixed up test-softlayer-ubi81_container-x64-1

mhdawson commented 1 year ago

resumed again - https://ci.nodejs.org/job/node-test-commit/60931/. Looks like only failures due to github last time were on test-softlayer-ubi81_container-x64-1 so here's hoping

mhdawson commented 1 year ago

https://ci.nodejs.org/job/node-test-commit/60931/ made it through :). I suspect there might still be a few machines that need fixing up since Jenkins seems to favor using the same machines.

MoLow commented 1 year ago

fixed up test-softlayer-ubi81_container-x64-1

@mhdawson what were the steps to fix it? I tried this with no success

mhdawson commented 1 year ago

I would ssh into the container host, then docker exec -it containerID /bin/bash and then update the key in .ssh/known_hosts, where containerID was the id for test-softlayer-ubi81_container-x64-1. When updating I would also remove all other rsa based keys.

MoLow commented 1 year ago

Ah! so the answer I was looking for was "manually" :)

MoLow commented 1 year ago

I am closing this as the issue seems to be fixed

BethGriggs commented 1 year ago

It hit this on test-rackspace-centos7-x64-1 this morning (job ref). It seems a few more hosts need to be updated

MoLow commented 1 year ago

I have updated the know_hosts file on test-rackspace-centos7-x64-1 it had a key set by the ip and by hostname, so I removed the redundant key. fetch worked after that

richardlau commented 1 year ago

I have updated the know_hosts file on test-rackspace-centos7-x64-1 it had a key set by the ip and by hostname, so I removed the redundant key. fetch worked after that

hmm FWIW I logged in just now and there were still 17 entries with the old key:

```console [root@test-rackspace-centos7-x64-1 ~]# cat /home/iojs/.ssh/known_hosts 192.30.255.112 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 192.30.253.113 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 52.64.108.95 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 13.236.229.21 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 13.237.44.5 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 140.82.114.4 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 140.82.113.3 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 140.82.112.3 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 140.82.114.3 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 140.82.112.4 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 140.82.113.4 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 13.114.40.48 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 52.69.186.44 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 52.192.72.89 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 13.229.188.59 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 13.250.177.223 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== 52.74.223.119 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg= github.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk= 20.205.243.166 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg= [root@test-rackspace-centos7-x64-1 ~]# ```

I've run the playbook from https://github.com/nodejs/build/pull/3256 and this has removed the 17 entries with the old key:

TASK [jenkins-worker : remove old github.com ssh keys] **********************************************************************************************************************************************************
[WARNING]: sftp transfer mechanism failed on [119.9.27.82]. Use ANSIBLE_DEBUG=1 to see detailed information
[WARNING]: scp transfer mechanism failed on [119.9.27.82]. Use ANSIBLE_DEBUG=1 to see detailed information
changed: [test-rackspace-centos7-x64-1] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==) => {"ansible_loop_var": "item", "backup": "", "changed": true, "found": 17, "item": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==", "msg": "17 line(s) removed"}
[root@test-rackspace-centos7-x64-1 ~]# cat /home/iojs/.ssh/known_hosts
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=
20.205.243.166 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
[root@test-rackspace-centos7-x64-1 ~]#
joaocgreis commented 1 year ago

Sorry I wasn't able to keep working on this on Friday.

The jenkins-workspace hosts (at least) also need the key from where the binary_tmp in use is stored, which might be any of the other jenkins-workspace hosts (this is configured with a variable in Jenkins).

richardlau commented 1 year ago

I've merged https://github.com/nodejs/build/pull/3256 -- it's good enough for the "normal" CI hosts. I'll open a follow up PR to address the docker hosts and the containers on them.

mhdawson commented 1 year ago

All of the containers on test-digitalocean-ubuntu1804-docker-x64-2 seem to have reverted. They show as only having been up 6 hours so my guess is that after manual updates we need to do something so they will persists over a restart. @richardlau do you know if we need to be committing the container ?

richardlau commented 1 year ago

eek. @mhdawson this would have been because I ran https://github.com/nodejs/build/pull/3265 against the docker hosts. That did remove the old ssh key, but I think what has happened is that it has now put the other keys (ecdsa-sha2-nistp256 and ssh-ed25519) and the key being returned isn't matching existing entries in the known_hosts that is one of the other keys (e.g. the new rsa key -- I did check after running the playbook that the old rsa key was removed).

richardlau commented 1 year ago

I did an experiment. Looking at test-softlayer-ubi81_container-x64-1, I looked at the last failing job that ran on it, https://ci.nodejs.org/job/node-test-commit-linux-containered/36856/nodes=ubi81_sharedlibs_openssl111fips_x64/console

23:02:26 stderr: Warning: the ECDSA host key for 'github.com' differs from the key for the IP address '140.82.114.4'

In the known_hosts file for that container (/home/iojs/test-softlayer-ubi81_container-x64-1/.ssh/known_hosts on the host), I can see an entry for that IP address:

140.82.114.4 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=

This is GitHub's new ssh key. I removed it from the known_hosts file and reran a build, https://ci.nodejs.org/job/node-test-commit-linux-containered/36857/nodes=ubi81_sharedlibs_openssl111fips_x64/consoleFull (I canceled it after it had finished the git checkout). Looking at the known_hosts file again, a new entry has been created for the IP address:

140.82.114.4 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=

This is GitHub's ecdsa-sha2-nistp256 key, instead of the rsa key that was there before.

richardlau commented 1 year ago

So what I think has happened is that, prior to running the playbook from https://github.com/nodejs/build/pull/3265, the known_hosts files for the containers only had one entry beginning github.com and it was the ssh-rsa key (the new one). All entries after that were for the specific IP addresses that github.com was resolving to (on some hosts the IP address is hashed) and had the ssh-rsa key. So I think git operations over ssh were matching that github.com line and then using that key and writing entries for the specific IP address.

After https://github.com/nodejs/build/pull/3265 there are now three entries in known_hosts for github.com (one for each of the different key types). Now ssh is writing new entries for the specific IP address with the ecdsa-sha2-nistp256 key. In the cases where github.com gets resolved to an IP address the host has previously connected to before https://github.com/nodejs/build/pull/3265, ssh appears to be negotiating the ecdsa-sha2-nistp256 key but is then checking it against the ssh-rsa key as that is what is in known_hosts.

The key type selection by ssh is deterministic, but it was changed by https://github.com/nodejs/build/pull/3265 as instead of previously only negotiating the only key type that was there (ssh-rsa), there is now all three of GitHub's supported key types and ssh is deterministically preferring ecdsa-sha2-nistp256.

For now I've manually wiped out the known_hosts file for the containers. I then reran the playbook from https://github.com/nodejs/build/pull/3265 to recreate the known_hosts file and then started a new CI run, https://ci.nodejs.org/job/node-test-commit-linux-containered/36859/. I can see while this is running that new entries have been written to known_hosts, all with the ecdsa-sha2-nistp256 key.

BethGriggs commented 1 year ago

We've hit 'Host key verification failed.' on test-rackspace-win2012r2_vs2019-x64-6 on a recent CITGM job:

richardlau commented 1 year ago

We've hit 'Host key verification failed.' on test-rackspace-win2012r2_vs2019-x64-6 on a recent CITGM job:

@StefanStojanovic Could you take a look at test-rackspace-win2012r2_vs2019-x64-6? We have Ansible tasks for updating the known_hosts file (latest update https://github.com/nodejs/build/pull/3265) but these aren't run for Windows (as the playbook runs a different set of roles).

StefanStojanovic commented 1 year ago

@richardlau, sorry for just checking this now. Anyway, test-rackspace-win2012r2_vs2019-x64-6 had an outdated known_hosts file, so I updated it manually and it should work properly now. Regards.