sujiar37 / AWX-HA-InstanceGroup

Build AWX clustering on Docker Standalone Installation
MIT License
107 stars 39 forks source link

10.0 redis clustering #26

Closed chris93111 closed 2 years ago

chris93111 commented 4 years ago

Hi @sujiar37

10.0 is available but i rabbitmq is replace by redis

i have see redis cluster is possible with master an slave but i don't know if is target for the cluster awx because is use socket

https://github.com/ansible/awx/pull/6034/files

Do you have see ? do you have any idea of ​​integration of redis in the cluster ?

chris93111 commented 4 years ago

i have maybe found awx it's now using postgres for inter-node communication i try and send feedback

sebstyle commented 4 years ago

Indeed 11.0.0 is the latest tag. On each node local Redis communicates with local web sockets connected to the web interface. Events relevant to the entire cluster are now communicated thru PostgreSQL. In future versions Redis will most likely also replace the role of Memcached.

Custom clustering with RabbitMQ as this playbook provides seems to no longer be relevant.

sebstyle commented 4 years ago

https://github.com/ansible/awx/issues/5443

chris93111 commented 4 years ago

@sebstyle yes clustering use notification postgresql for communication internode

ryanpetrello commented 4 years ago

Just chiming in - AWX isn't using any sort of redis clustering; every redis on every node is unaware of the others (if you all have questions, I'm happy to answer them).

RylandDeGregory commented 4 years ago

Just chiming in - AWX isn't using any sort of redis clustering; every redis on every node is unaware of the others (if you all have questions, I'm happy to answer them).

Hi Ryan,

What is the high-level process for clustering Local Docker installations now that Redis is implemented and inter-node communication is done through PostgreSQL?

moonrail commented 4 years ago

I have clustered some AWX nodes by just writing a wrapper around the official role local_docker

I have not fully tested every feature, but these are the steps required that I know of:

fitbeard commented 4 years ago

Hi everyone. How about this: https://github.com/fitbeard/awx-ha-cluster I borrowed ideas from this repo long time ago and now i’m running 11+ ha cluster in prod for some time. Still using generated static uuid instead of fqdn. Everything tested and working very well.

RylandDeGregory commented 4 years ago

Hi everyone. How about this: https://github.com/fitbeard/awx-ha-cluster I borrowed ideas from this repo long time ago and now i’m running 11+ ha cluster in prod for some time. Still using generated static uuid instead of fqdn. Everything tested and working very well.

I've used a lot of your functionality in making my own AWX 11+ installer, particularly your use of a Key Vault lookup plugin! (You use Hashicorp, I use AKV, but same idea!) I'd been wrestling for a while with how to use the inventory file securely...but avoiding it completely is entirely nicer.

bryanasdev000 commented 4 years ago

Hi everyone. How about this: https://github.com/fitbeard/awx-ha-cluster I borrowed ideas from this repo long time ago and now i’m running 11+ ha cluster in prod for some time. Still using generated static uuid instead of fqdn. Everything tested and working very well.

Thanks for sharing it! Gonna test ASAP.

chris93111 commented 4 years ago

Hi @ryanpetrello

Do you know why the web node try to contact all worker node in cluster in same port of web node ?

The worker is not exposed with docker

AWX version 11.2

2020-06-04 15:36:04,890 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx02-worker attempt number 10. 2020-06-04 15:36:04,892 WARNING awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker failed: 'Cannot connect to host vldvaawx01-worker:443 ssl:False [Connect call failed ('xxxxxxxx', 443)]'. 2020-06-04 15:36:04,893 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker attempt number 10. 2020-06-04 15:36:09,897 WARNING awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx02-worker failed: 'Cannot connect to host vldvaawx02-worker:443 ssl:False [Connect call failed ('xxxxxxxx', 443)]'. 2020-06-04 15:36:09,898 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx02-worker attempt number 11. 2020-06-04 15:36:09,900 WARNING awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker failed: 'Cannot connect to host vldvaawx01-worker:443 ssl:False [Connect call failed ('xxxxxxxxx', 443)]'. 2020-06-04 15:36:09,901 DEBUG awx.main.wsbroadcast Connection from vldvaawx02-web to vldvaawx01-worker attempt number 11.

thanks

ryanpetrello commented 4 years ago

@chris93111,

Yep. Starting with the removal of RabbitMQ, when a playbook runs on a certain node, the stdout events are broadcast to all other cluster nodes via websockets/ASGI over port 443.

This is how you can run a playbook on Node A, but view the streaming stdout results on Node B. Previously, our RabbitMQ clustering had a similar model which required shared network activity amongst the nodes, and this new ASGI traffic in the redis implementation is the analog to that behavior, so a requirement for this behavior to work in any clustered AWX installs is that each node/instance is routable to each other instance via some address on port 443.

from https://github.com/ansible/awx/issues/5443:

When an event is persisted to the database by the callback receiver, it also is broadcasted to all cluster peers via ASGI. In this way, if a playbook runs on Node A, users connected to Daphne on Nodes B, C, and D will receive a broadcast of these events and see the output in their browser tabs.

chris93111 commented 4 years ago

@ryanpetrello thanks for you response , ok i inderstand now ! but in this config the task can't run without web in node ? all node must contain worker and web composant right ?

ryanpetrello commented 4 years ago

@chris93111 With the way AWX works today, that's correct.

ryanpetrello commented 4 years ago

@sujiar37 @chris93111 since we're here chatting, another thing you may care to know about this - though I don't think it strictly affects clustering - is that in the near future, we're considering removing memcached entirely (because redis sort of serves the same purpose). I don't anticipate any notable changes for you all downstream aside from "point Django's caching at redis instead of memcached"

Details here:

https://github.com/ansible/awx/issues/6932 https://github.com/ansible/awx/pull/7240/

This is likely coming in the next major version of AWX in the coming weeks (12.0.0).

chris93111 commented 4 years ago

@ryanpetrello yes i have read this in google ansible project :)

Thanks for this information ;)

sujiar37 commented 4 years ago

@fitbeard This is great, thank you so much for your work and helping the community with the recent updates. Unfortunately due to other engagements, I couldn't look deeper and follow up after the version 10.0 with the introduction of redis clustering.

@ryanpetrello Great to see you here, Appreciate your comments and thank you for those information's.