plone / plone.recipe.zope2instance

zc.buildout recipe to setup and configure a Zope 2 instance.
https://pypi.org/project/plone.recipe.zope2instance
6 stars 23 forks source link

Behavior of ``restart`` after rebuilding the instance script #107

Closed dataflake closed 5 years ago

dataflake commented 5 years ago

The restart verb behaves well except for one case.

If you re-run the buildout and the configuration section that uses plone.recipe.zope2instance runs through a full Uninstall/Install as opposed to being just Updated, restart on an instance that was running before the buildout was run will always fail.

I have dug into the internals of zdaemon for a few hours. This stuff is really hard to debug because its logging information doesn't reach the normal logs, and it uses os.fork and os.execv.

I'm currently at a dead end in zdaemon.zdrun.Subprocess in the spawn method where the daemon manager process (which has been running since before the new buildout run) uses os.fork to fork off a process for the new child, and then os.execv to load the new Zope process into it. It passes the whole command line needed to invoke the child.

There's no error raised here, but the OS immediately kills the child, the exit status is 1. I can only guess that the issue is a disagreement between the process environment and the resources seen by the daemon manager and by the child, which has inherited them in the fork, and what the OS sees.

I'm brainstorming here, this is probably not a bug, but it should be documented somehow. "When you re-run your buildout you must use use stop and start" or "Stop the instance before running the buildout" or something like that.

mauritsvanrees commented 5 years ago

I have seen these kinds of problems too. Not too surprising: a process is running, and you remove its working directory, running script, and config files and replace them with something else, though very similar. I indeed stop the instance (and zeoserver and varnish or whatever the buildout installs) before running the buildout. When I forget this, it sometimes works, and sometimes not.

dataflake commented 5 years ago

This is probably a documentation issue. There should be a big note somewhere that if you re-run the buildout then you should use stop and start instead of restart. Maybe a "Known Issues" section at the bottom of the README (which could also benefit from a table of contents at the top)

dataflake commented 5 years ago

This will be fixed by #109