vmware-archive / pcfdev

This is the depricated version of PCF Dev - please visit the current Github repository https://github.com/cloudfoundry-incubator/cfdev for the latest updates
Apache License 2.0
272 stars 67 forks source link

cf dev fails to start during `Waiting for services to start` #365

Closed richard-cox closed 5 years ago

richard-cox commented 5 years ago

I recently had to recreate my instance of pcfdev and it now consistently fails to start up again. It looks like a certificate issue

$ cf dev start -m 16384
Using existing image.
Allocating 16384 MB out of 128847 MB total system memory (104842 MB free).
Importing VM...
^[  Starting VM...
Provisioning VM...
Waiting for services to start...
7 out of 58 running
7 out of 58 running
7 out of 58 running
7 out of 58 running
7 out of 58 running
28 out of 58 running
37 out of 58 running
48 out of 58 running
55 out of 58 running
<snip>
55 out of 58 running
55 out of 58 running
Timed out after 3600 seconds.
FAILED
Error: failed to provision VM: Process exited with: 1. Reason was:  ().
$ sudo tail -f /var/vcap/monit/monit.log
[UTC Feb 13 09:51:36] info     : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl
[UTC Feb 13 09:52:06] error    : 'metron_agent' failed to start
[UTC Feb 13 09:52:06] error    : 'loggregator_trafficcontroller' process is not running
[UTC Feb 13 09:52:06] info     : 'loggregator_trafficcontroller' trying to restart
[UTC Feb 13 09:52:06] info     : 'loggregator_trafficcontroller' start: /var/vcap/jobs/loggregator_trafficcontroller/bin/loggregator_trafficcontroller_ctl
[UTC Feb 13 09:52:36] error    : 'loggregator_trafficcontroller' failed to start
[UTC Feb 13 09:52:36] error    : 'doppler' process is not running
[UTC Feb 13 09:52:36] info     : 'doppler' trying to restart
[UTC Feb 13 09:52:36] info     : 'doppler' start: /var/vcap/jobs/doppler/bin/doppler_ctl
[UTC Feb 13 09:53:06] error    : 'doppler' failed to start
[UTC Feb 13 09:53:16] error    : 'metron_agent' process is not running
[UTC Feb 13 09:53:16] info     : 'metron_agent' trying to restart
[UTC Feb 13 09:53:16] info     : 'metron_agent' start: /var/vcap/jobs/metron_agent/bin/metron_agent_ctl
[UTC Feb 13 09:53:46] error    : 'metron_agent' failed to start
[UTC Feb 13 09:53:46] error    : 'loggregator_trafficcontroller' process is not running
[UTC Feb 13 09:53:46] info     : 'loggregator_trafficcontroller' trying to restart
[UTC Feb 13 09:53:46] info     : 'loggregator_trafficcontroller' start: /var/vcap/jobs/loggregator_trafficcontroller/bin/loggregator_trafficcontroller_ctl
[UTC Feb 13 09:54:16] error    : 'loggregator_trafficcontroller' failed to start
[UTC Feb 13 09:54:16] error    : 'doppler' process is not running
[UTC Feb 13 09:54:16] info     : 'doppler' trying to restart
[UTC Feb 13 09:54:16] info     : 'doppler' start: /var/vcap/jobs/doppler/bin/doppler_ctl
$ sudo cat /var/vcap/sys/log/doppler/doppler.log | grep certi
...lots of the following
2019/02/13 09:55:56 Could not use GRPC creds for server: x509: certificate has expired or is not yet valid
$ sudo cat /var/vcap/sys/log/metron_agent/metron.log | grep certi
...lots of the following
2019/02/13 09:56:36 Could not use GRPC creds for client: x509: certificate has expired or is not yet valid
$ sudo cat /var/vcap/sys/log/loggregator_trafficcontroller/trafficcontroller.log | grep cert
...lots of the following
2019/02/13 09:56:36 Could not use GRPC creds for client: x509: certificate has expired or is not yet valid
$ cf version
cf version 6.35.2+88a03e995.2018-03-15
$ cf dev version
PCF Dev version 0.30.0 (CLI: 850ae45, OVA: 0.549.0)
cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

aemengo commented 5 years ago

@richard-cox Please upgrade to the newest version on the CF Dev plugin. The one that you are currently using is outdated.

$ cf uninstall-plugin cfdev
$ cf install-plugin -r CF-Community "cfdev"
$ cf dev version
CLI: 0.0.13
richard-cox commented 5 years ago

@aemengo Is that based on https://github.com/cloudfoundry-incubator/cfdev? If so that's not an option for myself and team due to the lack of linux support

nwmac commented 5 years ago

@aemengo I have the same issue as Richard - I'm on linux, so can't switch to CF Dev.

aemengo commented 5 years ago

@richard-cox @nwmac

Unfortunately, you're correct. The certificates baked into the previous PCF Dev are right around their expiration dates. There aren't many good workarounds to employ from your end. Ideally, you could simply update to the latest version so we apologize for linux support not currently being available.

screen shot 2019-02-14 at 12 10 51 am

Linux support can be tracked here: https://github.com/cloudfoundry-incubator/cfdev/issues/18

dion-ict commented 5 years ago

@aemengo would it be possible to rebuild PCF Dev 0.30 with fresh certificates and provide linux devs with a solution while we wait for CF Dev under Linux? PCF Dev 0.30.1 ?

bipinm commented 5 years ago

I just tried to install PCFDev locally and get similar errors as above. Seems it is no longer possible to install & use PCFDev on linux systems anymore.

kdvolder commented 5 years ago

If the recommended version 0.0.13 is indeed newer than the 0.30 version the version numbering is rather confusing. Any sane person will assume the 0.30 is the newer one.

Also since I am on Linux I can't use that version (yet). But let's ignore that for a second as I have another concern.

I beleave that one of my coleagues recently tried the 0.0.13 version on Mac and it seemed to have much more restricted set of commands (also further re-inforcing the impression that its an ancient version). For example I don't beleave it supports a command like cf dev ssh (though it is hard for me to confirm this seeing as I cannot actually use it on my machine).

I should note that for testing our use case we use cf dev ssh to connect to the VM and make some minor configuration changes to uaa to allow for testing some oauth flow's locally. So unless cf dev ssh command is supported by 0.0.13 version it isn't going to work for us.

aemengo commented 5 years ago

@kdvolder Completely apologize for the confusion. There's a lot of misinformation floating around which is why we are trying to correct it all, to the best of our ability.

cf dev ssh was never intended to be a user facing feature. It was a command we added for the developers on the project to debug efficiently. The newest iteration of CF Dev much more closely mimics a cloud deployment of Cloud Foundry. If you'd like to make changes to uaa and are familiar with BOSH tooling, the appropriate way to do so is export the appropriate BOSH environment variables and then ssh into the appropriate instance. For example, on Mac:

$ eval "$(cf dev bosh env)"
$ bosh -d cf ssh uaa

Or even better, to edit the cf.yml deployment manifest and then re-deploy.

kdvolder commented 5 years ago

It was a command we added for the developers on the project to debug efficiently.

Sounds useful to anybody? You should probably put it back. Just IMHO.

Anyhow... I think we re straying a bit from the issue at hand here. When the time comes and I need help with that. I'll raise another ticket about possibly putting the ssh command back and/or documenting better ways to do things and avoid using it. Thanks for all the helpful info / comments :-)

lennarth-anaya commented 5 years ago

@aemengo thanks.

I read other threads about this certificates (I'm also using 0.30, which is the one suggested by default ). I ssh'ed as suggested on other threads to change VM's system date on /etc/init.d to comply with certificates dates (apil in 2018) but it's still complaining with the same issue.

Anyways, after executing the command you provided above on a Windows machine "cf uninstall-plugin cfdev", I'm getting:

Plugin cfdev does not exist

I found next one: pcfdev-v0.13.0, rather than 0.0.13. Is it publicly accessible?

Thanks in advance.

kdvolder commented 5 years ago

@lennarth-anaya

There's a typo in the instructions from @aemengo the old plugin is called 'pcfdev' and the new one 'cfdev' and the version numbers between the two are unrelated. So you uninstall 'pcfdev' then install 'cfdev'.

$ cf uninstall-plugin pcfdev
$ cf install-plugin -r CF-Community "cfdev"
$ cf dev version
CLI: 0.0.13
gitcreate31 commented 5 years ago

getting below Searching CF-Community for plugin cfdev... Plugin requested has no binary available for your platform.

Any previous versions can be tried under Linux (virtual box 5.2.26-128414~Ubuntu~xenial) ?

kdvolder commented 5 years ago

Any previous versions can be tried under Linux

My understanding is... the answer is "no". The 'old' version people were using, (including myself) was pcfdev version 0.30.0. But it recently broke because of some ssl certificates that expired. That's precisely what this github ticket is about. So I'm afraid we are out of luck :-(

BoykoAlex commented 5 years ago

Or even better, to edit the cf.yml deployment manifest and then re-deploy.

@aemengo I'd like to do the above. I found cf.yml here ~/.cfdev/services/cf.yml This file is re-created every time cf dev is stopped and the started (CF is deployed). Is that the file to edit? If yes, how shall I re-deploy the CF? Some BOSH command? Is it $ bosh -d cf restart?

aemengo commented 5 years ago

@BoykoAlex Yes, that file is recreated every time that you run cf dev start. To re-deploy, using BOSH, you could edit the file (after a successful start) and then:

$ eval "$(cf dev bosh env)"
$ bosh -d cf deploy ~/.cfdev/services/cf.yml

You can find out more here: https://bosh.io/docs/quick-start/

kdvolder commented 5 years ago

I read the comments in https://github.com/cloudfoundry-incubator/cfdev/issues/18 and from the looks of things this is described as a 'major feature request'. Combined with the fact that it had been sitting in the icebox until 5 days ago, I gather Linux support may not be arriving for a little while. Is that right @aemengo ?

If yes, how about trying to patch the VM for the 'old' pcfdev to fix those expired ceritificates? Is that a option? Maybe someone who knows what to do there could either

a) post some instructions for users to do this via cf dev ssh.

or even better

b) patch the actual vm that cf dev start downloads on the first run.

I'm asking because at the moment I'm really stuck, and so I gather is any other Linux dev who dependended on pcfdev as part of there development/testing workflow.

aemengo commented 5 years ago

@kdvolder @nwmac @dion-ict @bipinm @lennarth-anaya @gitcreate31 @BoykoAlex

As much as we rather not have you using that old version, your argument is hard to argue with. 🙂 We went with b). Just released a patch here with the troublesome certificates bumped: v0.30.1 for PCF1.11.0: https://network.pivotal.io/products/pcfdev#/releases/312110.

The certificates will expire again in one year, but we should have Linux support for the new architecture much further along. Always appreciate the engagement!

gitcreate31 commented 5 years ago

Thank you @aemengo .But I tried installing pcfdev-v0.30.1+PCF1.11.0-linux , when i run cf dev start , i am getting "FAILED Error: Pivotal Network returned: 404 Not Found."

pho-enix commented 5 years ago

I got the 404 first. After I did

cf uninstall-plugin pcfdev
cf install-plugin -r CF-Community "cfdev"

It worked fine. v0.30.1 was pulled and PCF Dev works fine.

gitcreate31 commented 5 years ago

i am getting below error with cf install-plugin -r CF-Community "cfdev" Searching CF-Community for plugin cfdev... Plugin requested has no binary available for your platform. FAILED

is it something related to permissions for the download pcfdev-v0.30.1+PCF1.11.0-linux ?

kdvolder commented 5 years ago

I tried installin 0.30.1 following these instructions: https://pivotal.io/platform/pcf-tutorials/getting-started-with-pivotal-cloud-foundry-dev/install-pcf-dev

I'm also getting the '404' error trying to start it.

$ cf dev start
FAILED
Error: Pivotal Network returned: 404 Not Found.
kdvolder commented 5 years ago

@pho-enix What you say didn't work for me. It still says the plugin is not available on Linux. I'm guessing that what you said here:

It worked fine. v0.30.1 was pulled and PCF Dev works fine.

... is probably not exactly what happened. Given the command you quoted I bet what you actually have 'pulled down' was 'cfdev version 0.0.13' not 'pcfdev version 0.30.1'.

aemengo commented 5 years ago

@kdvolder @gitcreate31 Could you please give it another try? Are you still getting 404s?

kdvolder commented 5 years ago

@aemengo Donwloading vm ... 8% and rising :-) So looks like 404 issue is fixed.

bipinm commented 5 years ago

Perfect.. All the services are up and running with v0.30.1 :)

The documentation @ https://docs.pivotal.io/pcf-dev/install-linux.html, states step 4 - "cf dev start", which starts a VM with 4 GB memory rather than the default 8 GB. The 8 GB might not be sufficient as well, since the memory usage goes up to 17.9 GB in ubuntu after all services are started up (1.2 GB before VM startup)

kdvolder commented 5 years ago

So I've used pcfdev a bit yesterday and this morning. It now starts fine and seems to work fine for the most part.

Just one 'small' problem I have noticed is that log streaming from apps doesn't seem to work. I.e. pushing apps through various ways, including cf push, the STS boot dashboard and the app we are developing (our app uses cf java client to push other apps and stream their logs)... I get no errors but despite this there's never any log output being displayed for the app. The same also happens when I do cf logs <appname>.

@aemengo Could this also be a certificate issue? I.e. maybe there's another expired certificate lurking somewhere?

richard-cox commented 5 years ago

@aemengo Thanks for updating! I've just successfully brought up v0.30.1 so will close this issue. However, as @kdvolder above, we also don't see any content in the log stream. Will investigate further next week.

dion-ict commented 5 years ago

@aemengo Thanks, this update got me back up and running as well.

I can confirm the issue @kdvolder @richard-cox are seeing, as well as 'cf app' always showing 0 disk and 0 memory usage (app running succesfully). The same behaviour when viewing app details on apps.local.pcfdev.io : 0 Bytes for Memory and Disk columns.

kdvolder commented 5 years ago

@richard-cox Fair enough to close this ticket. The issues about log streams (and app stats that @dion-ict noticed) are arguably different and less critiicial. I'll raise a separate ticket about the log streaming where this can be tracked independently.