use quorum queues or streams for fanout/transient queues upstream

So, currently there are different approaches upstream on what to do, let me try to document the current state here:

There is this bug: https://bugs.launchpad.net/kolla-ansible/+bug/2077448

There are different patchsets floating around. We basically have these mid term options:

move everything to streams

move everything to quorum queues

make it configurable by the user

make only some of it configurable (e.g. heat seems to need it and I bet if we take a closer look, more services actually will lose messages without it)

We currently have these patchsets:

https://review.opendev.org/c/openstack/kolla-ansible/+/927497 (by myself) uses quorum queues for all transient/fanout queues, passes basic CI, always on if quorum queues are configured

https://review.opendev.org/c/openstack/kolla-ansible/+/916911 (by mnasiadka) uses stream queues instead, doesn't pass CI, always on if quorum queues are configured, currently doesn't pass CI

https://review.opendev.org/c/openstack/kolla-ansible/+/924615 (by kevko) only  enables quorum queues for transient heat queues, it's always on, passes CI

https://review.opendev.org/c/openstack/kolla-ansible/+/924623 (by kevko) adds queue manager option to all services, which basically makes queue naming consistent, is depended on by the first and third patch, passes CI

This is also documented for the next two kolla upstream meetings here:

https://etherpad.opendev.org/p/KollaWhiteBoard#L72

But the Whiteboard upstream will get cleaned up so I wanted to have something more persistent to document the current state of the work and what decisions we need to make.

I'm not sure yet about quorum queues or streams, I need to research this topic a bit, but I think in either case we want to use it for all queues and maybe don't even make it possible for users to disable this, as afaik in some failure scenarios we currently lose messages from openstack services making the system as a whole less reliable than it could be.

osism / issues

use quorum queues or streams for fanout/transient queues upstream #1110