Closed xmadsen closed 7 years ago
Hello,
Before asking for your logs (from /var/log/cloudbench), can you please attempt to switch to the "experimental" branch and try it again? We have an outstanding PR with fixes that do include a much more robust code for Google Compute Engine.
Marcio
I'll try that now! Thanks.
Xander
On Tue, Mar 28, 2017 at 2:46 PM ibmcb notifications@github.com wrote:
Hello,
Before asking for your logs (from /var/log/cloudbench), can you please attempt to switch to the "experimental" branch and try it again? We have an outstanding PR with fixes that do include a much more robust code for Google Compute Engine.
Marcio
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-289867088, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdNmIv24sz4SiXcDiYlYlf1N0RZ8oks5rqVWYgaJpZM4Mqu6n .
I'm getting this error when I attempt to ./install -r orchestrator with the experimental branch: "[ERROR] install/main - There are -1 dependencies missing: list index out of range".
Very strange...any ideas?
X
On Tue, Mar 28, 2017 at 2:50 PM Xander M xander.madsen@gmail.com wrote:
I'll try that now! Thanks.
Xander
On Tue, Mar 28, 2017 at 2:46 PM ibmcb notifications@github.com wrote:
Hello,
Before asking for your logs (from /var/log/cloudbench), can you please attempt to switch to the "experimental" branch and try it again? We have an outstanding PR with fixes that do include a much more robust code for Google Compute Engine.
Marcio
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-289867088, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdNmIv24sz4SiXcDiYlYlf1N0RZ8oks5rqVWYgaJpZM4Mqu6n .
Ooops, my mistake. I forgot a few leftover Dockerfiles there. Please pull again... should work now.
[image: Screen Shot 2017-03-29 at 2.54.19 PM.png] I'm getting a regex matching issue now when I try to capture the newly made cb_nullworkload instance while going through the walkthrough. Is this an error?
X On Tue, Mar 28, 2017 at 4:10 PM ibmcb notifications@github.com wrote:
Ooops, my mistake. I forgot a few leftover Dockerfiles there. Please pull again... should work now.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-289889432, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdK3ZdLXPEmkF0md27_8cSG4J83Odks5rqWlNgaJpZM4Mqu6n .
Ha! In my regression tests, I always use an image called "regressiontest"... Therefore, I just forgot to convert all underscores (_) to dashes (-) when specifying an image name on GCE. It is a one-line fix, which I will push into the "experimental" branch in the next few hours.
Thanks,
Marcio
One-line fix for vmcapture on GCE pushed ("experimental" branch).
Marcio
[image: Screen Shot 2017-04-03 at 11.46.43 AM.png] I'm getting this error now after capturing the youngest cb_nullworkload image. What should I do here?
Thanks!
X On Wed, Mar 29, 2017 at 3:56 PM ibmcb notifications@github.com wrote:
One-line fix for vmcapture on GCE pushed ("experimental" branch).
Marcio
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ibmcb/cbtool/issues/136#issuecomment-290206719, or mute the thread https://github.com/notifications/unsubscribe-auth/AGQGdJq2hFYEDyv5Cnuf29Ni7zvAIzGeks5rqrd2gaJpZM4Mqu6n .
I believe the actual screenshot was not properly annexed to the previous message.
Marcio
Ah, sorry about that! Here it is:
Hello again,
I have now executed the "walkthrough" on GCE, where I found and fixed a couple of bugs (again, all my tests so far were done on Docker/Swarm, LXD, and OpenStack). Please pull the last changes to the experimental branch, and let me know if it is (at long last) working for you.
Regards,
Marcio
Is there a way to revert to the walkthrough mode? Or barring that, how do i cleanly remove the old build of cbtool so I can start fresh with the latest?
If you delete whatever image (cb-nullworkload) was already created on GCE, it will return to walkthrough mode (you can also delete the SSH keys if you want, although it is not strictly needed).
Now it's having trouble connecting to the GCE when performing a soft reset of cb. Sorry for all the trouble!
Not a problem at all! I just changed it very recently, and thank you for you patience.
It looks like your account has no images left. The error is probably on https://github.com/ibmcb/cbtool/blob/experimental/lib/clouds/gce_cloud_ops.py#L198. It should be a two-line fix, but just to confirm it, can you paste the contents of the command gcloud compute images list
?
Good, that is what I was expecting. Just pushed a fix on the experimental branch. Hopefully, the last hurdle.
When I retry this command I get the same 10.142.0.6 value being taken each time, even when the vm_X number increments. Any ideas?
So, it looks like all VMs are getting the same IP address. When you started the walk-through, you created 3 VMs, and they were assigned 3 different IP addresses, correct? In other words, this is happening, only with the vmattach tinyvm
, right?
Another question. Does this behavior persist if you just restart with --soft_reset
?
Correct to the first question, and yes the behavior persists if I restart with --soft_reset.
The last image I created, the one that I captured before running vmattach tinyvm
had an IP of 10.142.0.6.
i started the walkthrough over this morning and I'm getting a different message now:
So, when you followed the walkthrough, CBTOOL was able to properly ssh into the instances (booted with vmattach check:<imageid>:<username>
), but now, not anymore, correct? Just curious, which imageid have you used as a base?
You can try to check why it is not working by typing first vmdev
on the CLI and then attempting the vmattach
again. Instead of executing the command, it will print out what should have been done, and will let you run the commands, one by one.
Finally, can you send me a list of public keys associated to your account? The output of the command gcloud compute project-info describe
should suffice (I just need the -key sshKeys
part).
I am betting that a key called cbuser
does not exist on your account, which means that an user with this name does not get created on the instance, and since CBTOOL is trying to connect as a user cbuser
, it is failing.
I'm using ubuntu-1604-xenial-v20170330
as my imageid, and when following the walkthrough cbtool could ssh properly into those instances.
here's the output of vmdev
followed by vmattach tinyvm
:
here are the ssh keys:
- key: sshKeys
value: |-
xander:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKAjV4X1PFOl8GbyWthay3R9coLWUky4f7Sbdnj4mOzXUw4SXFWGS2XkHPA/lIOkTDOIFLa78ELJ1s4YUM8+er5jAbNA3xV5LmazUVT+avaCt39eS4m0Q7VndwC4c/2clxV0yAlN9eZcYA0bWokD0numzkBRcN2p9ZDkkwITYAzowih3kaviGXj5AYlXmwE+0LLzbUD2T3Ak6aYx3kRglwM/hrJelUJXQOYBKKcvVPt4qQMweqD6Wyzl7n9TJWPyqoDTWsGzCw2BJtrzDnPmEzm3L4HWReefx27yo0gb1s+aklYaZ1CUQu+Zf0WN5mfIQsMXNcNqZVj8z3MRoeK7Cd xander
ubuntu_cbtool_rsa:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKmfzD0oO8ThOU7O9puwbJLVlM3FNGjXzI+1LC4IQflzgrra5pKNk9xXiHZW+7dunutVxn8eH5TOpFx87gNdqeSlkTSbkjGwsF3k+kaE/Iu3BOGHgR2uWjs2HRQG9GlRMNppzS2FRXlYVDU5U6W7heuZ/QTzSa2qbbFl6ra8efYhueD+amVyFhHBzYxUmWKuccoH4Xtgkcv7DEv5FXRYPNzbSb8RQzev31c3Gh6Qbco/gYqec51H5FWWpu4ETzO2j8S+8M2DccO7Scb/c7bNDT+Jc5DdmzBKd9d+zhBZFzxqmMVIq6XlirR/mE1PQcsk+VeMr5vYRGZ/5alJCToKw/ cbtool@orchestrator
How do I add a cbuser key?
Two options to add the needed key:
1) Just delete the ubuntu_cbtool_rsa
key, and restart CBTOOL. It will detect that there aren't any keys on your account (created by CBTOOL, of course), and it will create two keys: ubuntu_cbtool_rsa
and cbuser
(this is a change on my last commit, since I realized that I need two keys for Google Compute Engine).
2) Just copy the contents of ubuntu_cbtool_rsa
onto a new key called cbuser
.
In any case, I recommend you do that through GCE's web GUI (go to Compute -> Metadata -> SshKeys.
vmattach tinyvm
worked that time! now i'm stuck on aiattach nullworkload
(MYGCE WALKTHROUGH) aiattach nullworkload
status: ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E) was successfully defined on Google Compute Engine "MYGCE" (will now be fully deployed)
status: Starting instance "cb-ubuntu-mygce-vm9-tinyvm-ai-2" on Google Compute Engine, using the image "cb-nullworkload" (4976859391615300299) and size "f1-micro", connected to network "private", on VMC "us-east1-b", under tenant "default", injecting the contents of the pub ssh key "ubuntu_cbtool_rsa" (userdata is "False").
status: Waiting for vm_9 operation to finish...
status: Waiting for vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E), to start...
status: Trying to establish network connectivity to vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E), on IP address 10.142.0.8 (using method "wait_for_0")...
status: Checking ssh accessibility on vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E): ssh -p 22 -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa cbuser@10.142.0.8 "/bin/true"...
status: This is the command that would have been executed from the orchestrator :
ssh -p 22 -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -l cbuser 10.142.0.8 "/bin/true"
status: Bootstrapping vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E): creating file cb_os_paramaters.txt in "cbuser" user's home dir on IP address 10.142.0.8...
status: This is the command that would have been executed from the orchestrator :
ssh -p 22 -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -l cbuser 10.142.0.8 "mkdir -p /home/cbuser/cbtool;echo '#OSKN-redis' > /home/cbuser/cb_os_parameters.txt;echo '#OSHN-10.142.0.2' >> /home/cbuser/cb_os_parameters.txt;echo '#OSPN-6379' >> /home/cbuser/cb_os_parameters.txt;echo '#OSDN-0' >> /home/cbuser/cb_os_parameters.txt;echo '#OSTO-240' >> /home/cbuser/cb_os_parameters.txt;echo '#OSCN-MYGCE' >> /home/cbuser/cb_os_parameters.txt;echo '#OSMO-controllable' >> /home/cbuser/cb_os_parameters.txt;echo '#OSOI-TEST_ubuntu:MYGCE' >> /home/cbuser/cb_os_parameters.txt;echo '#VMUUID-3CEDC24A-237A-554F-A571-F7FD89C5E618' >> /home/cbuser/cb_os_parameters.txt;sudo chown -R cbuser:cbuser /home/cbuser/cb_os_parameters.txt;sudo chown -R cbuser:cbuser /home/cbuser/cbtool"
status: Sending a copy of the code tree to vm_9 (3CEDC24A-237A-554F-A571-F7FD89C5E618), part of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E), on IP address 10.142.0.8...
status: This is the command that would have been executed from the orchestrator :
rsync -e "ssh -p 22 -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -l cbuser " --exclude-from '/home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../exclude_list.txt' -az --delete --no-o --no-g --inplace -O /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../* 10.142.0.8:~/cbtool/
status: ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E) was successfully defined on Google Compute Engine "MYGCE" (will now be fully deployed)
status: Performing generic application instance post_boot configurationon all VMs belonging to ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E)...
status: This is the command that would have been executed from the orchestrator on STEP 0 :
ssh -p 22 -i /home/ubuntu/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -l cbuser 10.142.0.8 "~/cbtool/scripts/common/cb_post_boot.sh"
status: Running application-specific "setup" configuration on all VMs belonging to ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E)...
status: QEMU Scraper will NOT be automatically started during the deployment of ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E)...
status: Command "~/cb_start_nothing.sh" failed to execute on hostname 10.142.0.8 after attempt 0. Will try 3 more times.
status: Command "~/cb_start_nothing.sh" failed to execute on hostname 10.142.0.8 after attempt 1. Will try 2 more times.
status: Command "~/cb_start_nothing.sh" failed to execute on hostname 10.142.0.8 after attempt 2. Will try 1 more times.
status: Parallel run os command operation failure: Giving up on executing command "~/cb_start_nothing.sh" on hostname 10.142.0.8. Too many attempts (3).
status: Sending a termination request for instance cb-ubuntu-mygce-vm9-tinyvm-ai-2 (cloud-assigned uuid 3800349376862996295)....
status: Waiting for vm_9 operation to finish...
status: Sending a destruction request for the volume "cb-ubuntu-mygce-vv9-tinyvm-ai_2" (cloud-assigned uuid None) previously attached to "cb-ubuntu-mygce-vm9-tinyvm-ai-2" (cloud-assigned uuid 3800349376862996295)....
status: Waiting for vm_9 operation to finish...
status: ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E) was successfully undefined on Google Compute Engine "MYGCE"
AI object DE07A9A6-B9D1-5EC5-852F-1989ED60694E (named "ai_2") could not be attached to this experiment: AI post-attachment operations failure: Parallel VM configuration for ai_2 failure (81717): Failure while executing application-specific configuration on on all VMs beloging to ai_2 (DE07A9A6-B9D1-5EC5-852F-1989ED60694E):
Parallel run os command operation failure: Giving up on executing command "~/cb_start_nothing.sh" on hostname 10.142.0.8. Too many attempts (3).
Very good. So, now make sure that you're not using the vmdev
command anymore. This command is for debug only. Please restart with --soft_reset
and then try a vmattach tinyvm
, followed by an aiattach nullworkload
, but this time without the vmdev
. You are very close now.
So I think I'm in good shape now, though I'm having trouble deploying cassandra_ycsb and kmeans (for SPECcloud 2016); from what I understand I need to run vmattach check:ubuntu-1604-xenial-v20170330:ubuntu:kmeans
, ssh into that new vm, run ~/cbtool/install -r workload -wk kmeans
, then vmcapture youngest kmeans
, then vmattach kmeans
. is that about right or am I missing something?
actually, vmattach check:ubuntu-1604-xenial-v20170330:cbuser:hadoop
, followed by ~/cbtool/install -r workload --wks hadoop
. At this point, do vmcapture youngest cb_hadoop
, and then do a restart with --soft_reset
. You should see a message indicating that a new image - cb_hadoop - is present. At this point, you can try a vmattach hadoopmaster
... More info can be obtained with typeshow hadoop
.
gotcha! would I do about the same for ycsb?
correct.... switch hadoop
for ycsb
and follow the same procedure.
Working great now! When should I expect these changes to make it into the master branch?
I have opened a new pull request (138) with all the fixes discussed here. While it is being reviewed by others, we will have to stick to the "experimental" branch.
Any ideas as to what's causing this?