Capture errors on platform environment build automation and do something useful with them #308

Open treasuretron opened 4 years ago

In GitLab by @thomasmurphy on Dec 20, 2018, 21:19

I thought we should have a (branchless!) gitlab issue for our work/research around this.

The support request text is below. Short answer is there is no way to pipe errors anywhere easily, but maybe with their webhooks system.

https://docs.platform.sh/administration/integrations/webhooks.html

In GitLab by @thomasmurphy on Dec 20, 2018, 21:20

changed the description

In GitLab by @thomasmurphy on Dec 20, 2018, 21:20

Thomas Murphy Yesterday at 12:27

We've been having some trouble with platform deployment jobs defined in platform.app.yaml (built by a different team) failing intermittently, but the platform admin UI saying the process had succeeded. Is it possible to pipe errors from automated drush tasks etc to the platform UI, or even back to a inked gitlab project? If we can't get notifications about failed automations we'd be better off not automating anything.

Ollie Murphy Today at 10:07

Hello Thomas,

The output (including errors) from the commands executed in the deploy hook of your .platform.app.yaml file can be found in your deploy.log, located at /var/log/deploy.log.

You can access that logfile by SSHing into your container or, alternatively, through the CLI by running the following:

platform logs deploy

Please let us know if that answers your question or if you need any more assistance.

Best,

Ollie

Thomas Murphy Today at 12:11

Hi Ollie,

Thanks, we did know how to access the logs manually. My question, which has a yes/no answer, was can we automate what happens to the errors? Because it we can't, there is no point in automating the task.

Currently we run a gitlab merge, which cause's a platform environment build, which fails 30% of the time, gitlab says the merge was successful and the platform UI says the build was successful, and the end result is the production site is dead 30% of the time after a deployment which tested fine in our QA environment, and we have to go and look at the logs manually.

If the answer is "no" that is fine, we will just stop automating tasks on platform.

cheers,

Thomas

Thomas Murphy Today at 12:13

Also, great to hang out with another Murphy, by the way, where are you based?

Ollie

Hi fellow Murphy!

I’m based in Las Vegas and it looks like you’re in New Zealand. Looks like there are Murphys everywhere!

Unfortunately, the answer to your question is a general no.

One possible workaround, would be to listen on a webhook for when the deployment has succeeded and to then scan your deployment logs for any errors with some type of service that you could integrate into you automation. You can read more about our webhooks from here: https://docs.platform.sh/administration/integrations/webhooks.html

Please let us know if you have any more questions.

Cheers,

Ollie

In GitLab by @jayelless on Jan 13, 2019, 19:53

From https://docs.platform.sh/configuration/app/build.html

Each hook is executed as a single script, so they will be considered failed only if the final command in them fails. To cause them to fail on the first failed command, add set -e to the beginning of the hook. If a build hook fails for any reason then the build is aborted and the deploy will not happen.

If we update the deploy hook command to include an initial "set -e" this should cause the deployment to fail if any of the steps fail.

hooks:
    # The deploy hook runs after your application has been deployed and started.
    deploy: |
      set -e
      cd web
      drush -y cache-rebuild
      drush -y updatedb
      drush -y config-import
      drush -y entup

In GitLab by @jayelless on Jan 13, 2019, 19:58

If a deployment fails, then the command platform environment:ssh tail /var/log/deploy.log will provide details of the error.

I also recommend that we use a post-deploy hook to move the contents of the deploy.log to an archive to that the log file contains only the results from the most recent deployment and not the results from the last few months.

In GitLab by @jayelless on Jan 13, 2019, 20:17

mentioned in merge request !255

In GitLab by @jayelless on Jan 13, 2019, 21:51

Well. That looked too easy :( The "set -e" command does not trigger a deploy abort if any of the subsequent commands contains a drush error. I will need to investigate further.

In GitLab by @thomasmurphy on Feb 12, 2019, 22:10

Update - platform are performing a performance and stability review and will get back to us in the next week.

In GitLab by @richbodo on Feb 21, 2019, 15:47

removed milestone

In GitLab by @jayelless on Feb 24, 2019, 21:19

I have run some additional tests on the Platform deployment process, and I find that the deployment process IS in fact, detecting the drush error return (a separate test revealed that drush 8 will return an error code of "1" if the config fails to import). This is shown by the following output in the deploy log, which ceases when the drush config import command returns with an error. You can observe that there is no result statement printed or output from the entity-updates command.

[2019-02-25 02:05:48.669787] Launching command 'set -e cd web drush -y cache-rebuild; echo "Result=$?" drush -y updatedb; echo "Result=$?" drush -y config-import; echo "Result=$?" drush -y entity-updates; echo "Result=$?" '.

Cache rebuild complete. [ok] Result=0 No database updates required [success] Result=0 Collection Config Operation core.extension update Import the listed configuration changes? (y/n): y Drupal\Core\Config\ConfigImporterException: There were errors [error] validating the config synchronization. Configuration coffee.configuration depends on the Coffee module that will not be installed after import. in Drupal\Core\Config\ConfigImporter->validate() (line 737 of /app/web/core/lib/Drupal/Core/Config/ConfigImporter.php). The import failed due for the following reasons: [error] Configuration coffee.configuration depends on the Coffee module that will not be installed after import.

It would therefore appear that there is a problem within the Platform pipeline that is not picking this up, and is marking the deployment process as completed successfully.

In GitLab by @thomasmurphy on Mar 7, 2019, 16:30

We're deferring this to the next sprint, or at least until we hear back from Platform.

spacebase / spacebasenz

Capture errors on platform environment build automation and do something useful with them #308