Open GoogleCodeExporter opened 8 years ago
I am changing the title to "Scale crawler to a client/server design aiming for
full
distributed system support". The client/server split is more important than any
full
fledged distributed design using Pyro, since it allows to scale the crawler to 2
machines.
I will be working on the client/server design and splitting the application
classes
to client/server code soon.
Original comment by abpil...@gmail.com
on 6 Oct 2008 at 11:21
Original comment by abpil...@gmail.com
on 6 Oct 2008 at 11:21
Original comment by abpil...@gmail.com
on 6 Oct 2008 at 11:21
How would this look?
Server parses the config.xml, splits the crawling job into subpages and
distributes
the set of subpages to its slave computers?
or?
Original comment by szybal...@gmail.com
on 12 Oct 2008 at 5:25
why not using python's 2.6 default library:
http://docs.python.org/library/multiprocessing.html#module-multiprocessing
I quote:
"multiprocessing is a package that supports spawning processes using an API
similar
to the threading module. The multiprocessing package offers both local and
remote
concurrency, effectively side-stepping the Global Interpreter Lock by using
subprocesses instead of threads. Due to this, the multiprocessing module allows
the
programmer to fully leverage multiple processors on a given machine. It runs on
both
Unix and Windows."
Original comment by andrei.p...@gmail.com
on 12 Oct 2008 at 4:04
This is an interesting angle. I never thought of using something directly in
standard
Python so far.
Btw, this is only from Python 2.6+, so this feature won't work with 2.4 <=
Python <
2.6. Still I think it is a great suggestion. I will read the docs and update.
Original comment by abpil...@gmail.com
on 13 Oct 2008 at 5:23
The library is available for older versions also. In Python 2.6 it was renamed
and
had some bugs fixed:
http://pyinsci.blogspot.com/2008/09/python-processing.html
http://pypi.python.org/pypi/processing
Original comment by andrei.p...@gmail.com
on 13 Oct 2008 at 7:43
Original issue reported on code.google.com by
andrei.p...@gmail.com
on 17 Jul 2008 at 9:33