Problem with Glance in mutliple AZ

osism / issues

This repository is used for bug reports that are cross-project or not bound to a specific repository (or to an unknown repository).

https://www.osism.tech

1 stars 1 forks source link

Problem with Glance in mutliple AZ #638

Closed chschilling closed 1 year ago

chschilling commented 1 year ago

Hey everybody,

in our setup, we have three availability zones for computes, our three control-nodes are split between these az's. every AZ has its own ceph cluster for RBD.

As we want to have a synchronized glance available in all three azs, we mirrored the pool from az-a to az-b and az-d. That is not helpful at the and, as glance is not aware of that, as direct_url is always pointing to az_a cluster.

So every time a machine is spawned in b or d, image is loaded from az-a cluster what takes long time.

Is there any experience, how that should be configured correctly? Do we need to create multiple backends? Can we create one backend per az and simply add "rbd_locations" to the image?

We have these options enabled show_multiple_locations = True show_image_direct_url = True

Hope you can help here.

Greetings,

Christian

osfrickler commented 1 year ago

This doesn't directly answer your question, but have you seen https://wiki.openstack.org/wiki/OSSN/OSSN-0090 ? Neither show_multiple_locations nor show_image_direct_url should be enabled for customer facing Glance APIs. Also neither Kolla nor OSISM currently support setting up dedicated internal-only Glance API servers. So the current suggestion is to disable these options and accept the price of longer block-device creation times.

chschilling commented 1 year ago

Good point, we'll discuss how we'll handle that.

berendt commented 1 year ago

We have a similar situation because we built Glance against a dedicated cluster with object storage. We solved this by using the image cache of Cinder.

https://docs.openstack.org/cinder/latest/admin/image-volume-cache.html

However, the cache is only filled by an image the first time it is used. We have solved the problem by adding a function in our project manager that performs a pre-heat of the cache. The pre-Heat creates a volume of one image in each storage AZ in a special project in a domain. This means that the cache is filled directly and the image is not removed from the cache because the volume remains.

chschilling commented 1 year ago

That sounds promising - that could solve multiple problems as we can move glance to s3 as well then. Is it possible to share the pre-heat function?

Greetings,

Christian

berendt commented 1 year ago

The pre-heat code is part of the osism/openstack-project-manager: https://github.com/osism/openstack-project-manager/blob/main/src/manage.py#L709-L758

chschilling commented 1 year ago

thx Think we can close that so far for now. We will discuss that and will go forward using the pre-heating.