Open jcnventura opened 4 years ago
I have never seen that failure. If anyone has any suggestions about a fix, I'd be interested.
From the symptom, it looks like it might have something to do with the headless chrome browser; I don't think that anything else is using X11. However, that is in conflict with your assertion that it happens during terminus drush
calls, which are ssh connections, so I'm not sure what is going on.
We inherit our chrome configuration from our base image, circleci/php:7.3-node-browsers
.
Is there a way to figure out if this error is originating in the terminus environment (i.e the quay.io/pantheon-public/build-tools-ci:6.x machine) or in the drush environment that terminus is connecting to?
Also, this CI is not running under CircleCI, but under GitLab CI.
Try modifying the call to Terminus to:
terminus -n -vvv drush pantheon_project.env -- -y cr --debug
The placement of the error message vis-a-vis the debug output should make it more clear where the error is being generated.
Status Code: 200
[warning] This environment is in read-only Git mode. If you want to make changes to the codebase of this site (e.g. updating modules or plugins), you will need to toggle into read/write SFTP mode first.
Warning: Permanently added '[appserver.dev.3c84c450-5e4f-4820-9347-9d4f40d12991.drush.in]:2222,[34.90.80.87]:2222' (RSA) to the list of known hosts.
[preflight] Redispatch to site-local Drush: /code/vendor/drush/drush/drush.
[preflight] Config paths: /.drush/drush.yml,/code/vendor/drush/drush/drush.yml
[preflight] Alias paths: /code/web/drush/sites,/code/drush/sites
[preflight] Commandfile search paths: /code/vendor/drush/drush/src,/opt/pantheon/drupal-extensions
[bootstrap] Starting bootstrap to site [0.29 sec, 11.58 MB]
[bootstrap] Drush bootstrap phase 2 [0.29 sec, 11.58 MB]
[bootstrap] Try to validate bootstrap phase 2 [0.29 sec, 11.58 MB]
[bootstrap] Try to validate bootstrap phase 2 [0.3 sec, 11.59 MB]
[bootstrap] Try to bootstrap at phase 2 [0.3 sec, 11.59 MB]
[bootstrap] Drush bootstrap phase: bootstrapDrupalRoot() [0.3 sec, 11.59 MB]
[bootstrap] Change working directory to /code/web [0.3 sec, 11.59 MB]
[bootstrap] Initialized Drupal 8.9.3 root directory at /code/web [0.3 sec, 11.81 MB]
[bootstrap] Try to validate bootstrap phase 2 [0.3 sec, 11.81 MB]
[bootstrap] Try to bootstrap at phase 2 [0.31 sec, 12.1 MB]
[bootstrap] Drush bootstrap phase: bootstrapDrupalSite() [0.31 sec, 12.1 MB]
[bootstrap] Initialized Drupal site dev-pantheon_project.pantheonsite.io at sites/default [0.31 sec, 12.33 MB]
[success] Cache rebuild complete. [14.2 sec, 112.88 MB]
[notice] Command: pantheon_project.dev -- drush [Exit: 137]
[error]
(EE)
00:02
Fatal server error:
(EE) Server is already active for display 99
If this server is no longer running, remove /tmp/.X99-lock
and start again.
(EE)
Seems like the drush command is fine, but terminus decides to throw an error for some reason, and then a fatal error that aborts our CI.
Is everything from the [error]
through the second (EE)
in red? Can't imagine why Terminus would fire up X11. Maybe it's the Terminus update checker? Terminus calls curl to check for the latest available version of Terminus; maybe headless chrome causes this call to behave differently, e.g. in some way that requires X11? Seems unlikely, but it's the only thing I have to go on right now.
Unfortunately, setting the hide_update_message
configuration setting to true
only hides the update message; Terminus still checks its latest version when this is set.
Another way to subvert the update check is to reroute stdout or stdin from Terminus.
Try:
terminus -n -vvv drush pantheon_project.env -- -y cr --debug < /dev/null
If that doesn't work, try redirecting stdout instead, although then you won't be able to see your output, so you'll have to add a check for $? being nonzero. Maybe try piping to tee
so that you redirect output and can still see it. That could work.
If the workaround gets you past the terminus drush cr
, we could fix Terminus so that it skips the version check if hide_update_message
is set.
I'm not sure where that (EE) is coming from. It might be part of GitLab CI.
I'm worried about why drush or terminus exits with a non-zero error code (137 in this case). Drush has no reason to exit with an error code, as it had just finished with [success] Cache rebuild complete
. We run several drush commands in sequence, so I'm guessing in the case above it managed to run terminus drush cr
, but errored on terminus drush updb
, 137 doesn't seem to be an acceptable error code from drush, so I'm guessing this is from terminus?
I’m also getting exit 137 in my CircleCI builds with image: cimg/php:8.1-browsers
, but during a terminus drush deploy
command:
terminus remote:drush --yes --no-interaction --progress -- $PANTHEON_SITE.$PANTHEON_ENV deploy
[notice] Database updates start.
[success] No pending updates.
[success] Cache rebuild start.
[success] Cache rebuild complete.
[success] Config import start.
[notice] There are no changes to import.
[success] Cache rebuild start.
[notice] Command: *******.dev -- drush deploy [Exit: 137]
[error]
i was getting a 137
here as well, which corresponded to an apparently unrelated mysql error.
slightly different setup, sounds like, but after rerunning the install command the 137 went away
I'm hearing internal chatter that the platform restarts all of the site's services during code deployments, and that this can cause drush commands to error out. Have the folks experiencing problems here tried using build:workflow:wait
between their code deployments and drush commands?
n.b. terminus build:env:push
implicitly calls terminus build:workflow:wait
, so those of you who did not modify the default script commands should already be using it.
Would you still use terminus build:workflow:wait
if your pantheon.yml file includes build_step: false
and if so, where would it go?
terminus build:env:push
and terminus build:workflow:wait
are only for use in projects where build_step
is false
.
If your build scripts already use terminus build:env:push
, then you don't need build:workflow:wait
, as it is already in use. If you use git push
, then you should run terminus build:workflow:wait
after the git push
to ensure that the platform has processed the code push before you try to use said code.
Also, I don't think that it was explicitly stated anywhere in the thread above, but an exit code 137 typically means "out of memory".
There is an alternate theory that the problem is caused by services restarting at the wrong time; while possible, I think this is unlikely. Following cccam's razor, the simplest explanation is that the command is failing because drush deploy
is using a lot of memory, and it needed more than was available to php-fpm.
Thanks for that @greg-1-anderson
Slightly unrelated but I am seeing this error code on a Pantheon multidev in my deploy automation:
Notice: ] Database updates start.
---------------- ------------- --------------- -------------------------------
Module Update ID Type Description
---------------- ------------- --------------- -------------------------------
system 10100 hook_update_n 10100 - Remove the year 2038
date limitation.
block_content 10100 hook_update_n 10100 - Update entity
definition to handle revision
routes.
block_content 10200 hook_update_n 10200 - Remove the unique
values constraint from block
content info fields.
dblog 10100 hook_update_n 10100 - Remove the year 2038
date limitation.
dblog 10101 hook_update_n 10101 - Converts the 'wid' of
the 'watchdog' table to a big
integer.
locale 10100 hook_update_n 10100 - Remove the year 2038
date limitation.
statistics 10100 hook_update_n 10100 - Remove the year 2038
date limitation.
user 10000 hook_update_n 10000 - Remove non-existent
permissions created by
migrations.
block_content block_libra post-update Update block_content 'block
ry_view_per library' view permission.
mission
block_content move_custom post-update Moves the custom block
_block_libr library to Content.
ary
block_content sort_permis post-update Update permissions for users
sions with "administer blocks"
permission.
editor image_lazy_ post-update Enable filter_image_lazy_load
load if editor_file_reference is
enabled.
file add_permiss post-update Grant all non-anonymous roles
ions_to_rol the 'delete own files'
es permission.
layout_builder timestamp_f post-update Update timestamp formatter
ormatter settings for Layout Builder
fields.
media oembed_load post-update Add the oEmbed loading
ing_attribu attribute setting to field
te formatter instances.
system enable_pass post-update Enable the password
word_compat compatibility module.
ibility
system linkset_set post-update Add new menu linkset endpoint
tings setting.
system timestamp_f post-update Update timestamp formatter
ormatter settings for entity view
displays.
text allowed_for post-update Add allowed_formats setting
mats to existing text fields.
views boolean_cus post-update Update Views config schema to
tom_titles make boolean custom titles
translatable.
views fix_revisio post-update Fix '-revision_id'
n_id_part replacement token syntax.
views oembed_eage post-update Add eager load option to all
r_load oembed type field
configurations.
views responsive_ post-update Add lazy load options to all
image_lazy_ responsive image type field
load configurations.
views timestamp_f post-update Update timestamp formatter
ormatter settings for views.
---------------- ------------- --------------- -------------------------------
Notice: > > [notice] Update started: block_content_update_10100
Notice: > > [notice] Update completed: views_post_update_timestamp_formatter
...
Notice: ] Command: d8hrw.sidebyside -- drush deploy [Exit: 137]
It may be interesting for you that this is when upgrading a site from Drupal 9 to Drupal 10
Calling terminus -- drush deploy
repeatedly eventually gets through but it's not pretty.
Hey @miiimooo I just saw this comment pop up in my email, and I (or my company) has quite an extensive history with this “exit code 137” on Pantheon, so I thought I would weigh in with my experience here.
Working with Pantheon, we seemed to have determined (not sure how well it’s documented) that this error code is happening b/c after a code push, the containers are not fully “ready” to run a command like $ drush updb
. It doesn’t completely make sense to me b/c the command partially does run, but we have almost completely got rid of this issue by “waiting” after the deploy. The terminus “workflow:wait” command after a deploy made this error code almost completely disappear, but we’ve been talking lately that this command might not be 100% accurate, so we are currently trying an extra “sleep” after that command. It’s definitely not ideal, but that has solved this issue for my project… hope that helps?
A problem we encounter often during our CI processes for deploying Drupal is the following:
(EE) 00:02 467Fatal server error: 468(EE) Server is already active for display 99 469 If this server is no longer running, remove /tmp/.X99-lock 470 and start again. 471(EE)
or_XSERVTransmkdir: Owner of /tmp/.X11-unix should be set to root
This usually occurs when running drush commands via terminus:
terminus -n drush pantheon_project.env -- -y cr
The CI command uses the following base image:
image: quay.io/pantheon-public/build-tools-ci:6.x
Re-running the deployment a few times will eventually suceeed in a successful deployment, but this can be quite a problem, and while it was a nuisance before, it has recently become a major annoyance that can easily make what was supposed to be a 2h job to deploy our multiple Pantheon-based sites into a multiple day waste of time.