shinesolutions / aem-aws-stack-builder

Adobe Experience Manager (AEM) infrastructure builder on AWS using CloudFormation stacks
Apache License 2.0
44 stars 38 forks source link

[AL2] Author-Standby promotion process failed to start AEM Author Service #427

Open mbloch1986 opened 3 years ago

mbloch1986 commented 3 years ago

Describe the bug On Amazon Linux 2 the process of promoting the author-standby as author-primary does not work properly. It failed at the stage to start the AEM Author service after it was previously stopped.

To Reproduce Steps to reproduce the behavior:

  1. Run the Script /opt/shinesolutions/aem-tools/promote-author-standby-to-primary.sh on the author-standby
  2. Wait until script failed

Expected behavior The Author-Standby promotion should have finished successfully.

Screenshots

Environment (please complete the following information if relevant):

Additional context it seems like that we should manage all service interactions in Amazon Linux 2 via the systemctl command instead of using the service command.

https://github.com/shinesolutions/puppet-aem-curator/blob/master/manifests/action_promote_author_standby_to_primary.pp#L44

Trying to follow the process manually works

As an alternative during the process of promoting the author-standby as author-primary, while the process waits for the login/welcome page to appear you can run the following commands in the following order to start AEM:

systemctl stop aem-author <- This will fail as the service is already stopped

systemctl start aem-author
mbloch1986 commented 1 year ago

The root cause of this failing is because we are managing the state of AEM via the old service & the newsystemctl command.

During provisioning AEM gets started with the systemctl command. The author standby promotion uses the service command for stopping & starting AEM.

During the service stop service aem-author stop, the cq.pid PID file gets deleted. During service start service aem-author-start the service script triggers a stop via systemctl, to ensure that there is no other service running. This stopping via systemctl fails because the stop script tries to delete the cq.pid file. Since it doesn't exist anymore the stop via systemctl is failing and therefore the start via service fails.

The solution is to make sure that all service commands are replaced with the systemd command systemctl.

mbloch1986 commented 1 year ago

AEM AWS Stack Provisioner 6.4.0 includes the replacement of all service commands with systemctl.

mbloch1986 commented 1 year ago

The next step is to replace the service command with systemctl in the manage service SSM Command, to make sure that the author standby promotion works even after an offline snapshot was taken.