Closed sogajeffrey closed 3 years ago
Is this a specific regression related to BOSH v263.4.0?
Yes. On newer bosh deployments the same agent code block works.
Does 263.5.0 work? It seems that 263.4.0 https://github.com/cloudfoundry/bosh/releases/tag/v263.4.0 added vars interpolation to add-ons, and 263.5.0 https://github.com/cloudfoundry/bosh/releases/tag/v263.5.0 fixed a few edge cases (one of which may be what we're running into here)
Ill be testing this out in our sandbox env.
Upgraded bosh to 263.5 and tested a redeploy. still got the below error (Testing using shield deployment)
bosh2 -d uswest2-sb-shield8 recreate
Using environment (openid, bosh.admin)
Using deployment 'uswest2-sb-shield8'
Continue? [yN]: y
Task 2112322
21:33:23 | Deprecation: Ignoring cloud config. Manifest contains 'networks' section.
21:33:23 | Preparing deployment: Preparing deployment (00:00:03)
21:33:28 | Preparing package compilation: Finding packages to compile (00:00:00)
21:33:28 | Updating instance shield: shield/d56b6b9d-8ea4-49d9-b5cd-bbd5e6fb5406 (0) (canary)
(00:12:14)
L Error: Timed out pinging to 8bf859d5-da3b-46a9-a746-780cc77c713b after 600 seconds
21:45:42 | Error: Timed out pinging to 8bf859d5-da3b-46a9-a746-780cc77c713b after 600 seconds
Started Wed Jul 22 21:33:23 UTC 2020
Finished Wed Jul 22 21:45:42 UTC 2020
Duration 00:12:19
Task 2112322 error
Hi James, Is there more information you need from Jeff to help resolve this issue without getting onto Jeff's environment directly?
I'm curious what happens to that deployment if all of the SHIELD BOSH release bits are removed from it, and it just provisions VMs. Does the bosh_agent respond to pings if the SHIELD software doesn't get loaded? Or is this a problem between the stemcell and BOSH itself?
If that doesn't shed any light on the situation, I'm going to need a minimum viable deployment to reproduce this in my lab, or on a replica of the same VPC/IaaS configuration.
@jhunt I ended up removing all shield related release/property bits from the bosh director manifest in my prod envs and all works fine after that.
Do you mean leaving in the shield property configs but removing the release bits?
That's exactly what I needed to know to assist in differential diagnostic.
Which SHIELD BOSH release jobs are you trying to put on this particular deployment?
@jhunt
Heres the info:
name: shield sha1: 6658600cfd6b0a8d1e5780f2016c69887bca696b url: https://github.com/shieldproject/shield-boshrelease/releases/download/v8.7.2/shield-8.7.2.tgz version: 8.7.2
shield-url: https://10.72.136.127 require-shield-core: false agent: key: | [REDACTED]
core: ca: | [REDACTED]
So this is just the shield-agent job?
Correct just shield agent and whatever shield agent needs to run on BOSH director. @jhunt
What IaaS are you spinning this on, and what stemcell version?
AWS @jhunt stemcell: sha1: ab6cc40471502ac46d296b446def0138d1a01742 url: https://bosh.io/d/stemcells/bosh-aws-xen-hvm-ubuntu-trusty-go_agent?v=3312.20
Trusty stemcells have been EOL'd since April of 2019.
Yea we know. This is an old cluster. We're moving off in the coming year.
Describe the bug Although the agent works fine backing up bosh itself, it renders bosh completely useless in terms of deploying vms as the SHIELD agent possibly (unconfirmed yet) conflicts with a BOSH agent process so the deployment never gets past deploying the VM meaning the director is never able to connect to the vm and execute the rest of the deployment.
To Reproduce Steps to reproduce the behavior:
Expected behavior BOSH will fail with:
SHIELD versions (please complete the following information):