Parallel version of Topographica

sf-issues commented 11 years ago

Converted from SourceForge issue 2218586, submitted by jbednar Submit Date: 2008-11-03 09:43 GMT

Due to their weakly interconnected graph structure, Topographica models lend themselves naturally to coarse-grained parallelization. Each event processor (e.g. a Projection, or in some cases a Sheet) can run on a different physical processor, exchanging information with other event processors using either shared memory or message passing protocols such as MPI.

Implementing this type of parallelization is not likely to require significant changes to the simulator, though it does require some thought to avoid some tricky issues. Our current plan is to do it using proxies for each Projection, so that the actual computation is done on a remote processor but the same interface is maintained at a master node. At first this would only be able to use as many processors as there are Projections in the simulation, but it could then be generalized to handle more fine-grained parallelism without much additional work. We would probably want to have a parallel Simulator class that would set up the Projection proxies automatically, perhaps chosen at startup using a command-line option.

For a shorter-term payoff, we could consider using some automatic parallel libraries, such as a BLAS implementation (like ATLAS or MKL or GOTO) that parallelizes numpy.dot. For more info on this and other numpy-specific ideas, see http://www.scipy.org/ParallelProgramming .

Below is an email discussion that might be useful when getting started, though some of it may be out of date.

Jim

From: Jefferson Provost	Date: Apr 24 11:16:01 2008 -0400
On Thu, Apr 24, 2008 at 4:51 AM, jbednar@inf\.ed\.ac\.uk wrote:
>	From: Jefferson Provost
>	Date: Apr 23 14:40:24 2008 -0400
>
>	On Fri, Apr 18, 2008 at 10:36 AM, jbednar@inf\.ed\.ac\.uk wrote:
>	>	From: Jefferson Provost
>	>	Date: Apr 9 10:47:53 2008 -0400
>	>
>	>	On Wed, Apr 9, 2008 at 10:16 AM, jbednar@inf\.ed\.ac\.uk wrote:
>	>	>	From: Jefferson Provost
>	>	>	Date: Apr 9 09:16:38 2008 -0400
>	>	>
>	>	>	On Apr 9, 2008, at 1:55 AM, James A. Bednar wrote:
>	>	>	>	From: Jefferson Provost
>	>	>	>	Date: Apr 8 20:03:58 2008 -0400
>	>	>	>
>	>	>	>	Right now I have a Ubuntu machine for
>	>	>	>	running the robotics stuff. I run Player on
>	>	>	>	the linux box and run the model on the mac.
>	>	>	>	The hope was to be able to use all 4 cores
>	>	>	>	on the mac, but so far I haven't gotten
>	>	>	>	there. The processing library has some nice
>	>	>	>	features, like a processing pool that
>	>	>	>	supports parallel map(), as well as
>	>	>	>	process-safe queues, etc. Unfortunately,
>	>	>	>	the synchronization primitives seem to be
>	>	>	>	broken on the Mac, raising an OSError.
>	>	>	>
>	>	>	>	BTW, my plan was to do a kind of Bulk
>	>	>	>	Synchronous Parallel thing using the
>	>	>	>	processing library, but now I'm starting to
>	>	>	>	wonder how parallelizable most of the
>	>	>	>	Topographica code is anymore. In principle,
>	>	>	>	topo simulations should be highly
>	>	>	>	parallelizable, but I'm wondering how true
>	>	>	>	that is given the way the code is written.
>	>	>	>	In particular, for message-passing-based
>	>	>	>	parallelism, where there's no shared memory,
>	>	>	>	it's not clear to me that we've really made
>	>	>	>	all the EPs independent of one another.
>	>	>	>	E.g. we pass the whole Connection object
>	>	>	>	into Sheet.input_event, and there's nothing
>	>	>	>	stopping Sheet.input_event from calling
>	>	>	>	methods on conn.src, which could live in
>	>	>	>	another process. I haven't looked into the
>	>	>	>	code to see whether that really happens, but
>	>	>	>	the point is that the interfaces are totally
>	>	>	>	open, and any part of the simulation can
>	>	>	>	call methods on any other part at any time.
>	>	>	>
>	>	>	> Well, the EPs don't actually do anything
>	>	>	> time-consuming in most of our simulations, so
>	>	>	> having the EPs independent isn't very
>	>	>	> important. Having the projections be
>	>	>	> independent is, and I think they are, right?
>	>	>	> Maybe for your networks the EPs also need to
>	>	>	> be? The EPs definitely call methods on the
>	>	>	> projections; e.g. ProjectionSheet.activate()
>	>	>	> simply calls Projection.activate() on each
>	>	>	> projection (which is what takes all the
>	>	>	> computation time), then collates the results
>	>	>	> (which is fairly quick).
>	>	>
>	>	>	Yeah, but is it possible for the projections to
>	>	>	"live" in a different process, given that they're
>	>	>	used both for computation and as a key part of
>	>	>	the data structure that holds everything
>	>	>	together?
>	>	>
>	>	>	Maybe it's not a problem. I haven't really tried
>	>	>	to do anything yet anyway.
>	>	>
>	>	> I'm not sure; I haven't really tried to do anything
>	>	> either. When you put it that way, maybe there is
>	>	> indeed a problem. We used to think of the EPs as
>	>	> doing the computation and then passing messages
>	>	> using MPI to some other processor, which indeed
>	>	> doesn't really make sense any more, because the EP
>	>	> needs to communicate with the Projection, and it
>	>	> doesn't do it by means of a message over a delayed
>	>	> channel, but by a method call that would require
>	>	> blocking. On the other hand, it's no worse than
>	>	> before -- if you think of a Sheet plus all of its
>	>	> incoming projections as one unit, as long as all of
>	>	> those things are on the same processor, this should
>	>	> all still work ok. So by splitting things up into
>	>	> Projections but not really making Projections be
>	>	> first-class objects in the simulator's graph, we've
>	>	> limited our ability to cheaply parallelize
>	>	> Projections on different processors. We could still
>	>	> imagine making Processors be first class now,
>	>	> i.e. communicating via messages, but that's a fairly
>	>	> big change to make at this stage, and won't even
>	>	> really buy much parallelization (only helping for
>	>	> complicated network diagrams, not for big individual
>	>	> sheets). But maybe that's enough for the quad-core
>	>	> and perhaps 8-core machines we're likely to have on
>	>	> our desktops over the next few years. Hmm...
>	>
>	>	It seems to me that the core of the problem is the use
>	>	of a Projection to represent both a set of weights and
>	>	the associated processing and to represent an edge in
>	>	the simulation graph, used to get information about
>	>	upstream and downstream nodes.
>	>
>	>	Maybe the right solution is to factor Projections away
>	>	from Connections, so that a Connection contains a
>	>	projection, rather than being one. In principle, a
>	>	Projection should only need to know the
>	>	SheetCoordinateFrames of its src and dest. It
>	>	shouldn't need access to the entire EP. (and hence the
>	>	entire simulation) In fact, a projection shouldn't even
>	>	need backlink to the connection that owns it.
>	>
>	>	If Projections are factored out like that, then it
>	>	should be easier put them in separate processes and
>	>	farm out processing to them via message queues.
>	>
>	> I've been thinking about this, and I haven't ever really
>	> come to any conclusion. I think it would be fine to
>	> separate Projection from Connection and indeed it might
>	> help us with parallel processing by having a
>	> ProxyProjection class that respects the Projection
>	> interface but actually talks to some remote Projection
>	> instantiation on another processor to do the actual
>	> computation. It seems like we could do that either way,
>	> though -- the connection could either be a
>	> ProxyProjection, or it could own one. But sure, having a
>	> more restricted interface to a Projection could help us do
>	> this more safely; I haven't been able to think it out all
>	> the way through.
>	>
>	> In any case, isn't what we really need to do is simply to
>	> make the Projection run activate() as soon as the message
>	> is sent out on that channel, long before it is actually
>	> delivered to the target Sheet? That way the processing
>	> can be done asynchronously, and then the target Sheet's
>	> activate() method would simply collect the results from
>	> the Projections, waiting until all are ready. To be
>	> specific, we would change Simulation.send_output so that
>	> it doesn't simply encode an event to be delivered after a
>	> delay, but actually starts some computation to start whose
>	> results will be delivered after the delay.
>
>	Well, this would break the code for my modulatory projection
>	model for FEF shifting-RF neurons. In that model the FEF has
>	both modulated and modulatory inputs projections and it must
>	wait to activate the modulated projection(s) until after any
>	modulatory input arrives. The code does that by simply
>	deferring the .activate() call on the modulated projections
>	until .process_current_time(). I'm not sure how else it
>	would work unless each connection also got a call to
>	.process_current_time() as well as a call to .activate().
>	That plan has a nice symmetry to it, actually. Effectively
>	it makes connections into a new class of EP. Then you'd have
>	"vertex" EPs and "edge" EPs -- for lack of better terms --
>	that make up the model graph.
>
>	I'm not sure how this all interacts with learning functions,
>	though. Especially considering our other discussion about
>	making "supervised" or "error driven" learning functions for
>	projections, though. The projections still need input from
>	the downstream sheet in order to do learning.
>
>	> The main reason we haven't done it that way before is to
>	> provide the target sheet as much flexibility as possible,
>	> for complete generality. In this case I don't think that
>	> generality would actually be lost, though, because the
>	> target sheet can still ignore the Projection's Activity if
>	> it wants to, and we can always provide special Projection
>	> classes that don't compute immediately if we want to.
>
>	The modulated projections would have to be special cases like
>	this, I guess. Though I'm afraid that allowing those kinds
>	of rare special cases will cause trouble down the line, if
>	people start coding under the assumption that all
>	simulations follow the typical basic pattern. (e.g. like the
>	assumption that all inputs to ProjectionSheet are
>	Projections).
>
>	Plus, if we still allow EPs to call .activate (and other
>	methods) directly on their input connections, then we haven't
>	really fixed the independence problem have we?
>
>	> The more serious problem is enforcing causality, so that
>	> events don't occur out of order or too early. The current
>	> approach ensures that Projection only updates its activity
>	> matrix in a single atomic operation, i.e. when the target
>	> sheet processes the event after the delay.
>
>	This is also what will allow Bulk Synchronous Parallel
>	processing. BSP is just a fancy term for dividing a
>	computation up into phases consisting of periods of
>	independent parallel computation w/o any communication, each
>	punctuated by a barrier and followed by a communication phase
>	in which information is exchanged among the processes. The
>	current event scheme fits this model nicely.
>
>	> If we do computation early, we have to make sure that
>	> anyone who asks still gets the old Projection activity, not
>	> the new one. To make it work cleanly, do we need a queue
>	> of Activity matrices on each Projection, with new ones
>	> pushed in as they are ready, and old ones popped out at the
>	> right times? Maybe there is a simpler way.
>
>	I think the Simulation object could still possibly handle it
>	all, as long as it allows the connections to see and process
>	their respective events before the events are passed to the
>	destination.
>
>	>
>	> Anyway, to summarize, I think we could separate Projection
>	> from Connection, but can't see the concrete benefits yet,
>	> and I'd like to support parallel processing by having
>	> Projections activate early, but can't see exactly how to
>	> keep everything working properly.
>
>	My main interest is in finding some way to enforce
>	independence, preventing the EP from calling arbitrary code
>	on its input connection, or worse, on conn.src. Lately I've
>	come to believe that programmers will do basically anything
>	that the interfaces will allow them to if it helps them get
>	the job done more easily. The last thing I'd want to see is,
>	e.g., someone implementing multilayer backprop by allowing
>	the output layer sheet to modify the weights on the input
>	projections to conn.src. That would be a nice easy way to do
>	things that would totally violate encapsulation.
>
> Hmm. Well, what should we do then? Should we make activate()
> and learn() do two phases, one where they tell each of their
> projections to start computing, and one where they collect the
> results? The initial phase would allow each of the Projections
> to start computing, and but the collection phase would block
> automatically on the first Projection that doesn't quite have
> the results ready yet, and would only complete once all of them
> have completed.
>
> Then we could separate Projection from Connection, make sure it
> has a limited interface, and then have a special ProxyProjection
> or a special Connection that has handles the proxying, so that
> everything computes on the main processor except for the
> specifically farmed-out Projections? Then eventually we could
> get finer-grained parallelism by farming out a Projection to
> multiple processors, not just one.
I think that sounds reasonable. Maybe, the right thing to do is
create a ParallelProjection proxy/wrapper that enforces the
interfaces, and then parallelize a bunch of the examples and see if
anything breaks. That might give a better sense of whether my
concerns about the interfaces are justified or overblown.

sf-issues commented 11 years ago

Submitted by jbednar Date: 2008-11-03 09:53 GMT

Oops; cut and paste error --

| Due to their weakly interconnected graph structure, Topographica | models lend themselves naturally to coarse-grained parallelization. | Each event processor (e.g. a Projection, or in some cases a Sheet) | can run on a different physical processor,

Projections are not currently event processors; this should have said 'each time-consuming component'. Making Projections be EventProcessors is something we have considered, as noted in the initial message, but they are not currently implemented that way.

sf-issues commented 11 years ago

Submitted by jbednar Date: 2011-10-03 15:59 GMT

Assigned to dobromir, as he'll make sure that it's all integrated properly, but of course Konstantin and Marco have done the actual implementation.

sf-issues commented 11 years ago

Submitted by ceball Date: 2011-10-25 10:51 GMT

-- OpenMP

"openmp does not work with gcal" https://sourceforge.net/tracker/?func=detail&aid=3427986&group_id=53602&atid=470929

"openmp: support optimized learning and output functions" https://sourceforge.net/tracker/?func=detail&aid=3427990&group_id=53602&atid=470932

-- MPI

Need to integrate Konstantin's changes. I'm working with Konstantin at the moment to clean up Konstantin's changes. Once the changes are small enough to be able to understand what to do with them, I'll post back here.

sf-issues commented 11 years ago

Submitted by ceball Date: 2011-10-25 23:00 GMT

Task missed for OpenMP: documentation in the user manual! Currently, the only documentation of openmp is in the comments of topo/misc/inlinec.py.

(For MPI, I hope to get documentation from Konstantin...)

sf-issues commented 11 years ago

Submitted by ceball Date: 2012-04-02 23:35 GMT

I'm dealing with MPI support.

jlstevens commented 10 years ago

Any news about the MPI version?

jlstevens commented 10 years ago

I've created a wiki page which references this issue (and some other links ceball kindly sent me by e-mail) . Any developments regarding the MPI version should go there.

ioam / topographica

Parallel version of Topographica #368