haroldyong / modwsgi

Automatically exported from code.google.com/p/modwsgi
0 stars 0 forks source link

Do not load the Python interpreter into the Apache process when the embedded mode is not used. #50

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
If the python interpreter is not loaded into the Apache process, then the
conflicts between mod_python, other apache modules, and other shared will
not occur. This has been a common source of problems and is very difficult
for end-users to diagnose.

Further, with such a change, mod_wsgi would be easier to extend to work
with multiple versions of Python, which has been a common request.

Maybe it is possible for mod_wsgi to work in the following manner: (1) The
apache module is not linked to any code that requires Python at all. Thus,
all conflicts within Apache are avoided. (2) A "template" daemon process is
forked. That template daemon process dynamically load a shared library that
is linked to a particular version of Python. (3) Then more processes to
handle requests, as needed.

In between steps (2) and (3), a special initialization callable can be
executed. That initialization callable could pre-load all Python modules
used by the application, and/or initialize other expensive data structures
that can be shared between processes. This should significantly reduce the
unshared writable memory used by each daemon process on any OS that uses
copy-on-write for fork().

Original issue reported on code.google.com by brianlsm...@gmail.com on 14 Jan 2008 at 1:25

GoogleCodeExporter commented 8 years ago
See the following:
http://izumi.plan99.net/blog/?p=19
http://izumi.plan99.net/blog/index.php/2007/10/15/making-ruby%e2%80%99s-garbage-
collector-copy-on-write-friendly-part-6-final/
http://izumi.plan99.net/blog/?p=34
http://www.bitwiese.de/2007/09/on-processes-and-threads.html

The posts linked to above refer to Ruby but the same applies to Python, except 
some
details about GC optimizations. These illustrate the savings that I expect to 
see by
forking after the initialization callable has been executed and all shared 
libraries
and modules have been (pre)loaded. mod_wsgi's current use of fork() is already
providing some of these savings but I think there is a room for a lot more.

Original comment by brianlsm...@gmail.com on 16 Jan 2008 at 1:04

GoogleCodeExporter commented 8 years ago
Preloading into a parent process means that you must have a monitor/management 
process for every distinct 
application, which runs as the user that the final application will run as. You 
can't just have one, running as root, 
which is used for all applications regardless of what user an application runs 
as.

As a consequence you end up with lots more processes for a start. This sort of 
scheme, although it may work for 
people running a system which is dedicated for a specific set of applications, 
is no good in a shared web hosting 
environment.

Original comment by Graham.Dumpleton@gmail.com on 16 Jan 2008 at 5:26

GoogleCodeExporter commented 8 years ago
I agree. My idea is to have the preloading script to be per-process-group, not
per-application. Further, the fork would be optimized away for the case where
processes=1.

Original comment by brian@briansmith.org on 16 Jan 2008 at 1:56

GoogleCodeExporter commented 8 years ago
Please disregard my suggestions for preloading before the forking. That is a 
totally
separate issue. (FWIW, I am doing a pure-Python prototype of the 
preload-then-fork
mechanism as WSGI middleware to test how much private RSS is actually reduced. 
I am
also looking at sending patches for Python itself, to switch it from read()ing 
module
files to mmap()ing them.)

Original comment by brianlsm...@gmail.com on 17 Jan 2008 at 3:08

GoogleCodeExporter commented 8 years ago
Keeping this here as prompt to look at all these sorts of issues when doing 
future restructuring of code and 
reviewing what functionality provided. Need to track down the discussions on 
mailing list and link them here.

Original comment by Graham.Dumpleton@gmail.com on 18 Feb 2008 at 9:31

GoogleCodeExporter commented 8 years ago
Going to close this issue down for now as not going to pursue the overall 
intent of what was being suggested. 
Ideas will not be forgotten though.

FWIW, in mod_wsgi 3.0 there is an experimental directive WSGILazyInitialization 
which allows one to defer when 
Python is initialised. The Python libraries are still linked into Apache 
mod_wsgi.so module, but if 
WSGIRestrictedEmbedded is enabled and so Python isn't required in Apache worker 
processes, Python will not be 
initialised in Apache parent. Only time it might be is if the aaa access hooks 
in mod_wsgi are used in which case 
everything goes back to the way it was.

Because not initialising Python in Apache parent, the worker processes are 
smaller, but need to do initialisation 
of Python interpreter in every daemon mode process. If only one such process 
then okay, but if more than one 
then overall memory usage is greater as don't benefit from sharing by 
initialising parent in Apache parent.

When support transient daemon processes need to have a separate monitor process 
to handle it. At that point, if 
that monitor used for all daemon processes, then can initialise Python in it 
instead and so get sharing benefits 
back for daemon processes, without needing to initialise it in Apache parent.

Original comment by Graham.Dumpleton@gmail.com on 6 Mar 2009 at 4:42