AWS replace-vm does not preserve the volume storage size

xyloman commented 7 years ago

We have size our OpsMgr VM in PCF to have a 100GB volume but when replace-vm command is ran the new VM only has a volume of 50GB which results in an insufficient space exception for OpsMgr instance with numerous tiles installed. Additional discussion here: https://github.com/pivotal-cf/pcf-pipelines/issues/133

cf-gitbot commented 7 years ago

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

christianang commented 7 years ago

@xyloman cliaas v0.1.10 now preserves volume storage sizes. If you are using pcf-pipelines and your tasks are leveraging the default czero/cflinuxfs2 image, then you should have this fix the next time any of your tasks run.

If you have any issues reopen the ticket.

xyloman commented 7 years ago

@christianang I have reran our upgrade-ops-manager pipeline with the default czero/cflinuxfs2 image and got the following exception:

2017/07/13 00:20:10 error: run instances failed: MissingParameter: The request must contain the parameter ebs

Currently leveraging pcf-pipelines from pivnet version v0.15.0.

christianang commented 7 years ago

@xyloman I just would like to clarify what specific changes you are making to the Ops Manager VM, so we can properly re-create the scenario in our test environment.

Are you: A) adding an additional volume, with a size of 100GB B) changing the size of the default root volume to 100GB from the default of 50GB

Also, would you mind sending the exact volume configurations of the VM over to us? Thanks in advance.

cf-gitbot commented 7 years ago

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

xyloman commented 7 years ago

We typically follow these steps to create the ops manager VM with the increased disk size.

https://docs.pivotal.io/pivotalcf/1-11/customizing/pcf-aws-manual-config.html#pcfaws-om-ami

See step 11: Click Next: Add Storage and adjust the Size (GiB) value. The default persistent disk value is 50 GB. Pivotal recommends increasing this value to a minimum of 100 GB.

christianang commented 7 years ago

Thanks for the info. We still haven't been able to reproduce the exact error message, but we have made some updates which might solve your issue. If you rerun the pipeline with the latest czero/cflinuxfs2 image it should work. Let us know how it goes.

phopper-pivotal commented 7 years ago

@christianang it failed with EBS side when checking for snapshot. some reason the snapshot entry is there but no actual snapshot. weird nonetheless, so they are going to create a snapshot manually and retry.

with the EBS changes do you know if it will cycle through to a valid snapshot?

christianang commented 7 years ago

@xyloman @phopper-pivotal sorry for the back and forth on this. However, we were able to reproduce the issue this time, since we hit this in our own pipelines. We have made an update to cliaas, which was pulled into the czero/cflinuxfs2 image. Go ahead and try rerunning your job and let us know how it goes.

phopper-pivotal commented 7 years ago

@christianang the Amazon EC2 Root Device Volume are yours backed by EBS? if so is there a flag to disable create from snapshot? some reason the root volumes which are backed by EBS have a snapshot reference but there's no actual snapshot that exists for the volume.

christianang commented 7 years ago

@phopper-pivotal ours are backed by EBS. It also has a snapshot that doesn't exist, I think it is the ops-man AMI that is adding that snapshot id. We updated cliaas to not copy over the snapshot id, which seemed to fix it on our end. Are you still getting the same error, with the latest update to czero/cflinuxfs2 and cliaas?

phopper-pivotal commented 7 years ago

@xyloman can you pull latest and re-run based on the changes to ignore the empty snapshot?

xyloman commented 7 years ago

@phopper-pivotal and @christianang this appears to have fixed the issue that we were having. Sorry for the late reply.

abbyachau commented 7 years ago

Many thanks, @xyloman, for confirming this fix works for you.

vmware-archive / cliaas

AWS replace-vm does not preserve the volume storage size #4