ibmcb / cbtool

Cloud Rapid Experimentation and Analysis Toolkit
Apache License 2.0
79 stars 49 forks source link

exceptions.ImportError when trying to attach GCE cloud #136

Closed xmadsen closed 7 years ago

xmadsen commented 7 years ago
screen shot 2017-03-27 at 3 42 16 pm

Any ideas as to what's causing this?

ibmcb commented 7 years ago

Hello,

Before asking for your logs (from /var/log/cloudbench), can you please attempt to switch to the "experimental" branch and try it again? We have an outstanding PR with fixes that do include a much more robust code for Google Compute Engine.

Marcio

xmadsen commented 7 years ago

I'll try that now! Thanks.

Xander

On Tue, Mar 28, 2017 at 2:46 PM ibmcb notifications@github.com wrote:

Hello,

Before asking for your logs (from /var/log/cloudbench), can you please attempt to switch to the "experimental" branch and try it again? We have an outstanding PR with fixes that do include a much more robust code for Google Compute Engine.

Marcio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-289867088, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdNmIv24sz4SiXcDiYlYlf1N0RZ8oks5rqVWYgaJpZM4Mqu6n .

xmadsen commented 7 years ago

I'm getting this error when I attempt to ./install -r orchestrator with the experimental branch: "[ERROR] install/main - There are -1 dependencies missing: list index out of range".

Very strange...any ideas?

X

On Tue, Mar 28, 2017 at 2:50 PM Xander M xander.madsen@gmail.com wrote:

I'll try that now! Thanks.

Xander

On Tue, Mar 28, 2017 at 2:46 PM ibmcb notifications@github.com wrote:

Hello,

Before asking for your logs (from /var/log/cloudbench), can you please attempt to switch to the "experimental" branch and try it again? We have an outstanding PR with fixes that do include a much more robust code for Google Compute Engine.

Marcio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-289867088, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdNmIv24sz4SiXcDiYlYlf1N0RZ8oks5rqVWYgaJpZM4Mqu6n .

ibmcb commented 7 years ago

Ooops, my mistake. I forgot a few leftover Dockerfiles there. Please pull again... should work now.

xmadsen commented 7 years ago

[image: Screen Shot 2017-03-29 at 2.54.19 PM.png] I'm getting a regex matching issue now when I try to capture the newly made cb_nullworkload instance while going through the walkthrough. Is this an error?

X On Tue, Mar 28, 2017 at 4:10 PM ibmcb notifications@github.com wrote:

Ooops, my mistake. I forgot a few leftover Dockerfiles there. Please pull again... should work now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-289889432, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdK3ZdLXPEmkF0md27_8cSG4J83Odks5rqWlNgaJpZM4Mqu6n .

ibmcb commented 7 years ago

Ha! In my regression tests, I always use an image called "regressiontest"... Therefore, I just forgot to convert all underscores (_) to dashes (-) when specifying an image name on GCE. It is a one-line fix, which I will push into the "experimental" branch in the next few hours.

Thanks,

Marcio

ibmcb commented 7 years ago

One-line fix for vmcapture on GCE pushed ("experimental" branch).

Marcio

xmadsen commented 7 years ago

[image: Screen Shot 2017-04-03 at 11.46.43 AM.png] I'm getting this error now after capturing the youngest cb_nullworkload image. What should I do here?

Thanks!

X On Wed, Mar 29, 2017 at 3:56 PM ibmcb notifications@github.com wrote:

One-line fix for vmcapture on GCE pushed ("experimental" branch).

Marcio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-290206719, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdJq2hFYEDyv5Cnuf29Ni7zvAIzGeks5rqrd2gaJpZM4Mqu6n .

ibmcb commented 7 years ago

I believe the actual screenshot was not properly annexed to the previous message.

Marcio

xmadsen commented 7 years ago

image

Ah, sorry about that! Here it is:

ibmcb commented 7 years ago

Hello again,

I have now executed the "walkthrough" on GCE, where I found and fixed a couple of bugs (again, all my tests so far were done on Docker/Swarm, LXD, and OpenStack). Please pull the last changes to the experimental branch, and let me know if it is (at long last) working for you.

Regards,

Marcio

xmadsen commented 7 years ago

Is there a way to revert to the walkthrough mode? Or barring that, how do i cleanly remove the old build of cbtool so I can start fresh with the latest?

ibmcb commented 7 years ago

If you delete whatever image (cb-nullworkload) was already created on GCE, it will return to walkthrough mode (you can also delete the SSH keys if you want, although it is not strictly needed).

xmadsen commented 7 years ago

Now it's having trouble connecting to the GCE when performing a soft reset of cb. Sorry for all the trouble! screen shot 2017-04-05 at 5 13 37 pm

ibmcb commented 7 years ago

Not a problem at all! I just changed it very recently, and thank you for you patience.

It looks like your account has no images left. The error is probably on https://github.com/ibmcb/cbtool/blob/experimental/lib/clouds/gce_cloud_ops.py#L198. It should be a two-line fix, but just to confirm it, can you paste the contents of the command gcloud compute images list?

xmadsen commented 7 years ago
screen shot 2017-04-05 at 6 19 13 pm
ibmcb commented 7 years ago

Good, that is what I was expecting. Just pushed a fix on the experimental branch. Hopefully, the last hurdle.

xmadsen commented 7 years ago
screen shot 2017-04-05 at 8 33 48 pm

When I retry this command I get the same 10.142.0.6 value being taken each time, even when the vm_X number increments. Any ideas?

ibmcb commented 7 years ago

So, it looks like all VMs are getting the same IP address. When you started the walk-through, you created 3 VMs, and they were assigned 3 different IP addresses, correct? In other words, this is happening, only with the vmattach tinyvm, right?

Another question. Does this behavior persist if you just restart with --soft_reset?

xmadsen commented 7 years ago

Correct to the first question, and yes the behavior persists if I restart with --soft_reset.

xmadsen commented 7 years ago

The last image I created, the one that I captured before running vmattach tinyvm had an IP of 10.142.0.6.

xmadsen commented 7 years ago
screen shot 2017-04-06 at 10 32 33 am

i started the walkthrough over this morning and I'm getting a different message now:

ibmcb commented 7 years ago

So, when you followed the walkthrough, CBTOOL was able to properly ssh into the instances (booted with vmattach check:<imageid>:<username>), but now, not anymore, correct? Just curious, which imageid have you used as a base?

You can try to check why it is not working by typing first vmdev on the CLI and then attempting the vmattach again. Instead of executing the command, it will print out what should have been done, and will let you run the commands, one by one.

Finally, can you send me a list of public keys associated to your account? The output of the command gcloud compute project-info describe should suffice (I just need the -key sshKeys part).

I am betting that a key called cbuser does not exist on your account, which means that an user with this name does not get created on the instance, and since CBTOOL is trying to connect as a user cbuser, it is failing.

xmadsen commented 7 years ago

I'm using ubuntu-1604-xenial-v20170330 as my imageid, and when following the walkthrough cbtool could ssh properly into those instances.

here's the output of vmdev followed by vmattach tinyvm: screen shot 2017-04-06 at 12 11 27 pm

here are the ssh keys:

- key: sshKeys
    value: |-
      xander:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKAjV4X1PFOl8GbyWthay3R9coLWUky4f7Sbdnj4mOzXUw4SXFWGS2XkHPA/lIOkTDOIFLa78ELJ1s4YUM8+er5jAbNA3xV5LmazUVT+avaCt39eS4m0Q7VndwC4c/2clxV0yAlN9eZcYA0bWokD0numzkBRcN2p9ZDkkwITYAzowih3kaviGXj5AYlXmwE+0LLzbUD2T3Ak6aYx3kRglwM/hrJelUJXQOYBKKcvVPt4qQMweqD6Wyzl7n9TJWPyqoDTWsGzCw2BJtrzDnPmEzm3L4HWReefx27yo0gb1s+aklYaZ1CUQu+Zf0WN5mfIQsMXNcNqZVj8z3MRoeK7Cd xander
      ubuntu_cbtool_rsa:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKmfzD0oO8ThOU7O9puwbJLVlM3FNGjXzI+1LC4IQflzgrra5pKNk9xXiHZW+7dunutVxn8eH5TOpFx87gNdqeSlkTSbkjGwsF3k+kaE/Iu3BOGHgR2uWjs2HRQG9GlRMNppzS2FRXlYVDU5U6W7heuZ/QTzSa2qbbFl6ra8efYhueD+amVyFhHBzYxUmWKuccoH4Xtgkcv7DEv5FXRYPNzbSb8RQzev31c3Gh6Qbco/gYqec51H5FWWpu4ETzO2j8S+8M2DccO7Scb/c7bNDT+Jc5DdmzBKd9d+zhBZFzxqmMVIq6XlirR/mE1PQcsk+VeMr5vYRGZ/5alJCToKw/ cbtool@orchestrator

How do I add a cbuser key?

ibmcb commented 7 years ago

Two options to add the needed key:

1) Just delete the ubuntu_cbtool_rsa key, and restart CBTOOL. It will detect that there aren't any keys on your account (created by CBTOOL, of course), and it will create two keys: ubuntu_cbtool_rsa and cbuser (this is a change on my last commit, since I realized that I need two keys for Google Compute Engine).

2) Just copy the contents of ubuntu_cbtool_rsa onto a new key called cbuser.

In any case, I recommend you do that through GCE's web GUI (go to Compute -> Metadata -> SshKeys.

xmadsen commented 7 years ago

vmattach tinyvm worked that time! now i'm stuck on aiattach nullworkload

(MYGCE WALKTHROUGH) aiattach nullworkload
 status: ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E) was successfully defined on Google Compute Engine "MYGCE"  (will now be fully deployed)
 status: Starting instance "cb-ubuntu-mygce-vm9-tinyvm-ai-2" on Google Compute Engine, using the image "cb-nullworkload" (4976859391615300299) and size "f1-micro", connected to network "private", on VMC "us-east1-b", under tenant "default", injecting the contents of the pub ssh key "ubuntu_cbtool_rsa" (userdata is "False").
 status: Waiting for vm_9 operation to finish...
 status: Waiting for vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E), to start...
 status: Trying to establish network connectivity to vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E), on IP address 10.142.0.8 (using method "wait_for_0")...
 status: Checking ssh accessibility on vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E): ssh -p 22 -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa cbuser@10.142.0.8 "/bin/true"...
 status: This is the command that would have been executed from the orchestrator : 
         ssh  -p 22  -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l cbuser 10.142.0.8 "/bin/true"
 status: Bootstrapping vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E): creating file cb_os_paramaters.txt in "cbuser" user's home dir on IP address 10.142.0.8...
 status: This is the command that would have been executed from the orchestrator : 
         ssh  -p 22  -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l cbuser 10.142.0.8 "mkdir -p /home/cbuser/cbtool;echo '#OSKN-redis' > /home/cbuser/cb_os_parameters.txt;echo '#OSHN-10.142.0.2' >> /home/cbuser/cb_os_parameters.txt;echo '#OSPN-6379' >>  /home/cbuser/cb_os_parameters.txt;echo '#OSDN-0' >>  /home/cbuser/cb_os_parameters.txt;echo '#OSTO-240' >>  /home/cbuser/cb_os_parameters.txt;echo '#OSCN-MYGCE' >>  /home/cbuser/cb_os_parameters.txt;echo '#OSMO-controllable' >>  /home/cbuser/cb_os_parameters.txt;echo '#OSOI-TEST_ubuntu:MYGCE' >>  /home/cbuser/cb_os_parameters.txt;echo '#VMUUID-3CEDC24A-237A-554F-A571-F7FD89C5E618' >>  /home/cbuser/cb_os_parameters.txt;sudo chown -R cbuser:cbuser /home/cbuser/cb_os_parameters.txt;sudo chown -R cbuser:cbuser  /home/cbuser/cbtool"
 status: Sending a copy of the code tree to vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E), on IP address 10.142.0.8...
 status: This is the command that would have been executed from the orchestrator : 
         rsync -e "ssh  -p 22  -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l cbuser " --exclude-from '/home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../exclude_list.txt' -az --delete --no-o --no-g --inplace -O /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../* 10.142.0.8:~/cbtool/
 status: ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E) was successfully defined on Google Compute Engine "MYGCE"  (will now be fully deployed)
 status: Performing generic application instance post_boot configurationon all VMs belonging to ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E)...
 status: This is the command that would have been executed from the orchestrator on STEP 0 : 
         ssh  -p 22  -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l cbuser 10.142.0.8 "~/cbtool/scripts/common/cb_post_boot.sh"
 status: Running application-specific "setup" configuration on all VMs belonging to ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E)...
 status: QEMU Scraper will NOT be automatically started during the deployment of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E)...
 status: Command "~/cb_start_nothing.sh" failed to execute on hostname 10.142.0.8 after attempt 0. Will try 3 more times.
 status: Command "~/cb_start_nothing.sh" failed to execute on hostname 10.142.0.8 after attempt 1. Will try 2 more times.
 status: Command "~/cb_start_nothing.sh" failed to execute on hostname 10.142.0.8 after attempt 2. Will try 1 more times.
 status: Parallel run os command operation failure: Giving up on executing command "~/cb_start_nothing.sh" on hostname 10.142.0.8. Too many attempts (3).

 status: Sending a termination request for instance cb-ubuntu-mygce-vm9-tinyvm-ai-2 (cloud-assigned uuid 3800349376862996295)....
 status: Waiting for vm_9 operation to finish...
 status: Sending a destruction request for the volume "cb-ubuntu-mygce-vv9-tinyvm-ai_2" (cloud-assigned uuid None) previously attached to "cb-ubuntu-mygce-vm9-tinyvm-ai-2" (cloud-assigned uuid 3800349376862996295)....
 status: Waiting for vm_9 operation to finish...
 status: ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E) was successfully undefined on Google Compute Engine "MYGCE" 
AI object DE07A9A6-B9D1-5EC5-852F-1989ED60694E (named "ai_2") could not be attached to this experiment: AI post-attachment operations failure: Parallel VM configuration for ai_2 failure (81717): Failure while executing application-specific configuration on on all VMs beloging to ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E):
 Parallel run os command operation failure: Giving up on executing command "~/cb_start_nothing.sh" on hostname 10.142.0.8. Too many attempts (3).
ibmcb commented 7 years ago

Very good. So, now make sure that you're not using the vmdev command anymore. This command is for debug only. Please restart with --soft_reset and then try a vmattach tinyvm, followed by an aiattach nullworkload, but this time without the vmdev. You are very close now.

xmadsen commented 7 years ago

So I think I'm in good shape now, though I'm having trouble deploying cassandra_ycsb and kmeans (for SPECcloud 2016); from what I understand I need to run vmattach check:ubuntu-1604-xenial-v20170330:ubuntu:kmeans, ssh into that new vm, run ~/cbtool/install -r workload -wk kmeans, then vmcapture youngest kmeans, then vmattach kmeans. is that about right or am I missing something?

ibmcb commented 7 years ago

actually, vmattach check:ubuntu-1604-xenial-v20170330:cbuser:hadoop, followed by ~/cbtool/install -r workload --wks hadoop. At this point, do vmcapture youngest cb_hadoop, and then do a restart with --soft_reset. You should see a message indicating that a new image - cb_hadoop - is present. At this point, you can try a vmattach hadoopmaster... More info can be obtained with typeshow hadoop.

xmadsen commented 7 years ago

gotcha! would I do about the same for ycsb?

ibmcb commented 7 years ago

correct.... switch hadoop for ycsb and follow the same procedure.

xmadsen commented 7 years ago

Working great now! When should I expect these changes to make it into the master branch?

maugustosilva commented 7 years ago

I have opened a new pull request (138) with all the fixes discussed here. While it is being reviewed by others, we will have to stick to the "experimental" branch.