vmware-archive / pcfdev

This is the depricated version of PCF Dev - please visit the current Github repository https://github.com/cloudfoundry-incubator/cfdev for the latest updates
Apache License 2.0
272 stars 67 forks source link

Issue starting VM on Mac OSX El-Captain #79

Closed luckyabhishek closed 8 years ago

luckyabhishek commented 8 years ago

Hi, I downloaded the latest versions of Virtualbox (5.0.20 r106931), Cloud Foundry CLI (6.19.0+b29b4e0-2016-06-08) and the pcf dev plugin (pcfdev-v0.16.0-osx). When i give a cf dev start command on my terminal, I get a message which says

"FAILED
Error: failed to start VM: timed out waiting for vm to stop: %!!(MISSING)s(<nil>)"

I can see the VM is in aborted state on the virtualbox console. When I try to start the VM it asks me to select Ubuntu or Additional operating system. Once I choose Ubuntu the VM does come up, and starts asking for username, password which I don't have.

Any ideas on what might be going wrong ?

sclevine commented 8 years ago

Hi @luckyabhishek,

This is a known issue. Some VM states (like aborted or paused (#78)) leave the VM in a state that the CLI plugin cannot handle. Delete the VM in Virtualbox along with any .vmdk files in ~/.pcfdev, and run cf dev start again.

This should be fixed in the next release (0.17.0).

Thanks, Stephen

luckyabhishek commented 8 years ago

Hi Stephen, Thanks for your response. I faced this issue at work while starting the VM using cf dev start. This was taking the VM to aborted state. I did the same thing from home (without corporate proxy) and it worked just fine. I think It has to do with the proxy. Going through the documentation the only difference I could find was that my proxy variables were set as http_proxy and https_proxy instead of HTTP_PROXY and HTTPS_PROXY. Could that be an issue ? I will try it again at work on Monday, however would be good to know if it's a known problem.

Abhishek

sclevine commented 8 years ago

Are you sure that you received the timed out waiting for vm to stop error when you ran cf dev start? This error should only show up if you're attempting the stop the VM with cf dev stop.

I would only suspect your proxy to be the issue if your VM got into a running state at some point. Can you destroy the VM and provide the full output of cf dev start?

(Also, both uppercase and lowercase http[s]_proxy environment variables are supported.)

luckyabhishek commented 8 years ago

Yes I am 100% sure as I tried to do it a few time i.e. cf dev start. The error sounded funny in a way as it said failed to start and timed out waiting for vm to stop :)

sclevine commented 8 years ago

Hi @luckyabhishek,

Sorry for the delay. Can you try the latest version (0.17.0) of PCF Dev?

-Stephen

luckyabhishek commented 8 years ago

Hi Stephen,

The VM still doesn't come up, although I don't get an aborted error any more. The error I get now is

"Error: failed to provision VM: Process exited with: 1. Reason was: ()"

I know this tells nothing. When I try to see the console of the VM from virtualbox it shows

eth0:Reset adapter message multiple times and the VM never starts

luckyabhishek commented 8 years ago

Tried to do a cf dev start on a RHEL system and got exactly the same error "Error: failed to provision VM: Process exited with: 1. Reason was: ()"

On the mac it is stuck on "50 out of 50 starting"

I am able to login using cf login though.

luckyabhishek commented 8 years ago

It's really unstable. I destroyed the running VM to change some settings. and now the cf dev start is stuck at 35 out of 50 running since last 15-20 minutes.

mar1ged commented 8 years ago

Looks similar to my problem, at least when seen from the outside: https://github.com/pivotal-cf/pcfdev/issues/93

sclevine commented 8 years ago

Hi @luckyabhishek, sorry that you're still having issues with PCF Dev.

Let's focus on the mac first:

Can you provide the full output of cf dev start? It would also be helpful to see the contents of /var/pcfdev/run.log (you can access the VM via ssh: ssh vcap@local.pcfdev.io password: vcap). You may also want to check that you have enough free memory and disk space for the VM.

Thanks, Stephen

luckyabhishek commented 8 years ago

Hi,

I am attaching the run.log as a zip file here... Removed the link

I see a failure to assign the role SpaceDeveloper to the user admin and it says that it could be a proxy issue. I don't believe it can be a proxy issue as before it fails to assign SpaceDeveloper role it has successfully assigned SpaceManager role and made some other calls on the same endpoint.

+ cf create-org pcfdev-org
Creating org pcfdev-org as admin...
OK
Org pcfdev-org already exists
+ cf create-space pcfdev-space -o pcfdev-org
Creating space pcfdev-space in org pcfdev-org as admin...
OK
Assigning role SpaceManager to user admin in org pcfdev-org / space pcfdev-space as admin...
OK
Assigning role SpaceDeveloper to user admin in org pcfdev-org / space pcfdev-space as admin...
FAILED
Error performing request: Put https://api.local.pcfdev.io/v2/organizations/de81d108-5034-453e-b093-d2bbee8eac4b/users/0b1d53f0-bd34-4af0-a249-324cca4ebed9: dial tcp: i/o timeout

Appreciate your help.

Abhishek

luckyabhishek commented 8 years ago

Okay... I think I made some progress. I updated the cf cli inside the VM to latest version 6.20.0+25b1961-2016-06-29 and it seems to be working now. My guess is that CF CLI 6.16 which is currently being installed by the PCFDev doesn't honor the no_proxy variable and tries to hit my corporate proxy, which is unable to resolve local.pcfdev.io After upgrading to 6.20 the VM Starts fine. There's still a challenge with starting the docker container. I will update you once I make some progress.

Can you fix the pcfdev.json to install the latest CLI in the next build please ?

Thanks, Abhishek

luckyabhishek commented 8 years ago

Sorry for multiple updates... but it doesn't look like a CLI version issue either. What I see is that when the machine comes up, it always fails. After the machine comes up, if I do a ./reset and ./run again, everything works as expected :).

luckyabhishek commented 8 years ago

PFA the run.log. run.log.zip

when I do a cf dev start. The output on my console says,

Using existing image
Allocating 4096 MB out of 16384 MB total system memory (8225 MB free).
Importing VM...
Starting VM...
Provisioning VM...
Waiting for services to start...
9 out of 50 running
50 out of 50 running
FAILED
Error: failed to provision VM: Process exited with: 1. Reason was:  ()

Would appreciate if we can make this work somehow.

Thanks

sclevine commented 8 years ago

Hi @luckyabhishek,

Check out the end of the run.log:

+ started=50
+ [[ 50 -lt 50 ]]
+ echo '50 out of 50 running'
++ cc_status_code local.pcfdev.io
++ curl -s -I -o /dev/null -w '%{http_code}' -H 'Host: api.local.pcfdev.io' http://localhost/v2/info
+ [[ 200 != 200 ]]
++ uaa_response local.pcfdev.io
++ curl -s -H 'Host: login.local.pcfdev.io' http://localhost/healthz
+ [[ ok != \o\k ]]
+ cf api https://api.local.pcfdev.io --skip-ssl-validation
Setting api endpoint to https://api.local.pcfdev.io...
FAILED
Error performing request: Get https://api.local.pcfdev.io/v2/info: dial tcp: i/o timeout
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.

This is interesting because it appears that we can curl https://api.local.pcfdev.io, but we cannot connect to it via cf api. The curl bypasses resolving the system domain, but the cf api command does not. If you SSH to the VM after a failed provision, are you able to resolve api.local.pcfdev.io? Are you able to successfully run cf api https://api.local.pcfdev.io --skip-ssl-validation?

luckyabhishek commented 8 years ago

api.local.pcfdev.io does resolve to the right IP. Here's the output of ping from the pcf dev VM..

PING api.local.pcfdev.io (192.168.11.11) 56(84) bytes of data.
64 bytes from 192.168.11.11: icmp_seq=1 ttl=64 time=0.014 ms

However cf api fails from within the VM with exactly the same error.

 sudo cf api https://api.local.pcfdev.io --skip-ssl-validation
Setting api endpoint to https://api.local.pcfdev.io...
FAILED
Error performing request: Get https://api.local.pcfdev.io/v2/info: dial tcp: i/o timeout
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.

However there seems to be some kind of race condition. When I tried it another time (i.e. destroyed the VM and started it again) the cf api does work properly however it fails with proxy kind of issue again when I try to do a cf target

sudo cf target -o pcfdev-org -s pcfdev-space
FAILED
Could not target org.
Error performing request: Get https://api.local.pcfdev.io/v2/organizations?q=name%!!(MISSING)A(MISSING)pcfdev-org&inline-relations-depth=1: dial tcp: i/o timeout
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.
sclevine commented 8 years ago

Can you provide the (sanitized, if necessary) output of sudo env | grep -i proxy when run inside the VM? Can you confirm that the proxy servers in that output are resolvable and routable?

luckyabhishek commented 8 years ago

Yes the proxy is definitely resolvable.I am able to curl google through the proxy from within the VM. Here you go ... NOPROXY=localhost,127.0.0.1,192.168.11.1,192.168.11.11,local.pcfdev.io,localhost,.local.pcfdev.io,doppler.local.pcfdev.io,ssh.local.pcfdev.io,api.local.pcfdev.io,registry.local.pcfdev.io,_.dev,192.168.99.100,192.168.99.101,192.168.99.102,192.168.99.103,192.168.99.104,192.168.99.105,192.168.99.106,192.168.99.107,192.168.99.108,192.168.99.109,192.168.99.110,192.168.99.111,192.168.99.112,192.168.99.113,192.168.99.114,192.168.99.115,192.168.99.116,192.168.99.117,192.168.99.118,192.168.99.119,192.168.99.120,.dev,.local.pcfdev.io http_proxy=http://user:password@ip:port https_proxy=http://user:password@ip:port HTTPS_PROXY=http://user:password@ip:port noproxy=localhost,127.0.0.1,192.168.11.1,192.168.11.11,local.pcfdev.io,localhost,.local.pcfdev.io,doppler.local.pcfdev.io,ssh.local.pcfdev.io,api.local.pcfdev.io,registry.local.pcfdev.io,_.dev,192.168.99.100,192.168.99.101,192.168.99.102,192.168.99.103,192.168.99.104,192.168.99.105,192.168.99.106,192.168.99.107,192.168.99.108,192.168.99.109,192.168.99.110,192.168.99.111,192.168.99.112,192.168.99.113,192.168.99.114,192.168.99.115,192.168.99.116,192.168.99.117,192.168.99.118,192.168.99.119,192.168.99.120,.dev,.local.pcfdev.io HTTP_PROXY=http://user:password@ip:port

stwomack commented 8 years ago

Any progress here? I'm stuck with the same issue (and not behind proxy)

luckyabhishek commented 8 years ago

@stwomack without the proxy the VM is coming up without any issues for me...

sclevine commented 8 years ago

@stwomack Please open a new issue and include the last 200 lines of /var/pcfdev/run.log from inside the VM. You can access the VM via ssh: ssh vcap@local.pcfdev.io password: vcap.

@luckyabhishek Can you provide the output of curl -v https://api.local.pcfdev.io/v2/info and curl -v https://example.com from inside the VM as the vcap user? We're going to cut another release soon with some DNS and proxy improvements -- hopefully this will address your issue.

luckyabhishek commented 8 years ago

Seems like I am in business.

Here are the steps I followed

1) cf dev destroy (In case one exists) 2) cf dev start (This fails with a proxy issue.) 3) cf dev stop (Stops successfully) 4) VBoxManage modifyvm "pcfdev-v0.136.0" --natdnshostresolver1 on (I am using a dnsmasq on my local machine so I changed the resolver to my host resolver for the guest VM as well) 5) cf dev start (Starts successfully)

I am able to deploy a simple ruby app after this on the pcfdev instance. I am still not able to run a docker container on this VM though. I will open a separate issue for that.

Thanks for the help. I hope this would be fixed in the next build and I won't have to modify the dnsresolver manually.

sclevine commented 8 years ago

Awesome! DNS has been completed re-vamped in the latest release candidates, including the change you describe. The next release (0.18.0) will hopefully be cut in next few days, so please try it out and re-open this Github issue if it doesn't completely fix your issue.