ioam / topographica

A general-purpose neural simulator focusing on topographic maps.
topographica.org
BSD 3-Clause "New" or "Revised" License
53 stars 32 forks source link

Controlled (i.e independent) weight initialization #567

Closed jlstevens closed 10 years ago

jlstevens commented 10 years ago

Summary

This commit allows time_dependent=True to be enabled throughout Topographica, allowing reproducible, well controlled numbergenNumberGenerator, imagen.PatternGenerator and TransferFn objects to be used everywhere.

This is particularly important for reproducible random number streams and will enable parallel weight generation when initializing a model (as random patterns are no longer affected by their order of instantiation).

This PR is only for Topographica with the other necessary changes already pushed to master of the relevant submodules (namely numbergen and imagen).

The two main changes in this PR are:

Design Overview

Objects with state that should depend on some notion of time (simulation time, real time whatever) should inherit from numbergen.TimeAware or numbergen.TimeDependent. The essential difference between these classes is that the latter is always time_dependent whereas the former can be toggled (see their respective docstrings).

Objects that inherit from these classes will have access to time_fn which reports the time value. By default, the time_fn is a global time object that is an instance of param.Time which provides facilities for navigating the timeline. If you want a global notion of time, it is necessary that all objects that use time reference the same object.

Controlling random state

This depends on time-dependent hashing using numbergen.Hash whereby the hash is taken of the object name (which sets the random 'identity' of the object), the value returned by the time_fn and the global param.seed. This hash is then used to seed the random state per call.

I found this functionality was being repeated in numbergen.RandomDistribution, imagen.random.RandomGenerator and topo.transferfn.TransferFnWithRandomState which is why I've introduced a new abstract class to numbergen,numbergen.TimeAwareRandomness which is used by all three of these classes.

Weight generation

Assuming an time-dependent imagen pattern is passed into a connect call, this PR simply uses the param.Time context manager to temporarily set the global time to zero and then sets a coordinate-dependent name for each pattern used to generate the weights of each CF. For a given value of param.random_seed and CF positions, each weight pattern is therefore completely independent and constrained (which should eventually allow for parallel initialization).

Outstanding Issues

Notes

Relevant commits

These commits have already been pushed and tested. All tests pass for each respective repository except for the scriptrepr problem (described above) in Topographica.

Param - ioam/param@45963dad1 Imagen - ioam/imagen@ad72c309 Topographica - ioam/topographica@47463e971c9

Tobias-Fischer commented 10 years ago

This is great, thank you!

jlstevens commented 10 years ago

Two more comments:

jlstevens commented 10 years ago

I've updated my fork so that sullivan_neurocomputing04.ty works as before (weight generation is back off to the uncontrolled style if param.Dynamic.time_fn is not an instance of param.Time). You can also disable controlled weight generation by setting the attribute `controlled_weight_generationon theConnectionField`` class to False.

In addition, I've made changes to som_retinotopy so that it passes the existing training tests. Once this fork is merged, we will have a lot of test data to update if we want to switch to time-dependent models throughout!

jlstevens commented 10 years ago

Tobias pointed out to me that the CF name does not contain enough information. In particular, it doesn't include the projection name. This is important to get different weights for different projections whenever the CF locations (in SheetCoordinates) happen to match (given a single weight generator is shared).

Unfortunately, ConnectionField objects don't seem to have access to the projection (and therefore the projection name) which would allow the cleanest, simplest solution:

`name = "%s CF (%.5f, %.5f)" % (projection.name, x,y)``

The next best solution that can be immediately implemented is to use the weight generator name (passed into connect by the user) as follows:

name = "CF %s (%.5f, %.5f)" % (weights_generator.name, x,y)

Then, if the user wanted to make the weights match for two projections (with the same weight_generator and matching CF positions), the pattern generators could be given the same name. The problem with this approach is that having to name all the weight generators used is a pain.

I am reluctant to change the constructor signature of ConnectionField in any way to avoid legacy issues. Unfortunately, ConnectionField is not a parameterized object (it inherits from object) which means that CFs don't have their own names.

jbednar commented 10 years ago

Why not just add an optional argument to the ConnectionField constructor that's a label or tag or description, and then have the Projection fill that in with its own name (and optionally the target's name)? Projection and ConnectionField are always supplied together, and thus will never have compatibility issues with each other, and if the argument is optional then there should be no issues with user code either.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

jlstevens commented 10 years ago

Good point. My latest commit adds a hash_format parameter to control the string used for hashing. This might be a useful parameter to control if you want to have two projections with matching weights.

Do you think hash_format should also use the destination sheet name (by default) to ensure weights are different even if projections with the same name are connected to the same sheet?

jbednar commented 10 years ago

EventProcessors enforce a rule that all incoming (but not outgoing) connections have unique names, so the destination sheet name is not necessary to ensure that incoming projections will have different weights. On the other hand, using the destination sheet name is important to ensure that the weights will be unique across the entire simulation, since sheet names are required to be unique. Otherwise, Afferent->LGNOn and Afferent->LGNOff will be identical by default. So I think we should include the destination sheet name by default. I think that means that this pull request can be merged as soon as the script_repr issue is addressed.

jlstevens commented 10 years ago

Ok. I've replied to your commit comments and will make all the suggested changes shortly..

Any thoughts on how best to address the script_repr problem?

jlstevens commented 10 years ago

I've made two of the changes you suggested.

One thing worth mentioning: the script_repr issue is something that was introduced in commits already introduced to the master branch of param and imagen (see the current buildbot errors)! It is a related issue (and does need fixing ASAP) but is otherwise it is not a problem that will be introduced by merging this PR.

jbednar commented 10 years ago

Ok, then please go ahead and merge it.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

jlstevens commented 10 years ago

Ok great!

The next thing to do is to parallelize the weight generation to give everyone a big incentive to move over to the new system and never look back. Earlier today, I looked at an old prototype script using multiprocessing and improved on it, this time documenting everything on this wiki page.

It would be good if you could have a look at that page to see if you like my plan.... in short, the only way to gain a speed-up is to use multiprocessing.Array to share memory between processes, otherwise pickling arrays makes the multicore implementation slower than the single-threaded one! Using the buffer interface, I think we can have a memory efficient implementation with a decent speed-up...

jbednar commented 10 years ago

I've looked at the script_repr issue, and it looks to me like what's happening is that the old script_repr did not include any representation for the random_generator, presumably because the parameter value was left at the default (and thus suppressed as redundant for the script_repr). Now that the value is non-default, the solution is to make the representation into valid Python. Luckily, as you said, the specific representation doesn't matter much, as it's now tightly controlled for time_dependent=True. So, please just use the simplest accurate representation that would run, especially if that already provides reproducibility. If the simplest one doesn't capture the random state, consider whether there is a practical way to include that, but if not, don't worry about it -- our main intention with script_reprs is simply to get a clean description of the model back out; it doesn't need to capture everything to be useful.

jbednar commented 10 years ago

As for parallel weight generation, your plan sounds reasonable to me; I don't see any obvious flaws or any clear way to do it more simply or faster.

jlstevens commented 10 years ago

Implementing a __repr__ for the random_state parameter in one place is something I am happy to do (and would have done already, if I knew how!). The issue is that param normally handles repr generation. Having to implement __repr__ for every class with random state seems excessive (TimeAwareRandomStatenow has many subclasses!)

I'm hoping there is some correct way of hooking into script_repr that I don't know about - otherwise, all I have are _bad_ ideas:

  1. Implement __repr__ everywhere - lots of repetitive, brittle code!
  2. Implement __repr__ everywhere, calling repr of super and using regular expression search and replace (slow! bad!)
  3. Wrap Python and Numpy random state objects into our own object with a good repr. Would add extra indirection for very little value and would need to mirror Python and Numpy's APIs.
  4. Monkey patch Python and Numpy's random state objects. Bad and surprising!
  5. Add a special case in param to check for random state objects to print them nicely.

So I have lots of bad ideas but no good ones! Maybe to patch the actual random_state parameter in TimeAwareRandomState somehow?

I'm hoping you'll have a clean, simple and correct approach that I can immediately implement! :-)

jbednar commented 10 years ago

Ah, I see the problem now -- that the item(s) in question that need better reprs aren't in our own code, and thus we can't simply add proper reprs to them.

I'm not sure what options 1 and 2 would entail, but they don't sound promising.

Option 3 makes sense but would be a pain.

Option 4 would be very bad if we monkeypatched __repr__, but it isn't necessarily that bad if we monkeypatch scriptrepr onto those, since there's no existing method by that name to screw up.

For 5, instead of checking just for random state objects, maybe Parameterized.scriptrepr could do a regex replace of "<([a-zA-Z0-9.]+) object at 0x[0-9a-e]+>" with "\1()", to replace any such object with the corresponding instance specification. Hacky, but the real hack is in Python's representation of the object being so useless, so we could be excused.

In any case, luckily there is an option 6, which is the script_repr_reg implemented in param.parameterized, which allows code to register a user-written script_repr for a system type that has a useless default repr. We currently use that for lists, tuples, and functions, but it can be extended to cover at least this specific type, and maybe any example of such an object. See parameterized.py for details (search for script_repr_reg).

jlstevens commented 10 years ago

Fixed the scriptrepr issue in this commit: ioam/param@96fa7174734cfc7

The scriptrepr test now passes and this fix will certainly be safe when using time-dependent random state.

I would like to note that random state was not being controlled properly, even before my latest changes. Here is part of the script repr output from hierarchical.ty used in the scriptrepr test from an older checkout of Topographica:

topo.sim['LeftRetina']=topo.base.generatorsheet.GeneratorSheet(input_generator=imagen.Line(bounds=BoundingBox(points=((-0.75,-0.5),(0.75,0.5))),
        orientation=numbergen.UniformRandom(lbound=-3.141592653589793,
            ubound=3.141592653589793),
        position=[numbergen.UniformRandom(lbound=-0.5,
            ubound=0.5),numbergen.UniformRandom(lbound=-0.5,
            ubound=0.5)],
        thickness=0.02,
        x=numbergen.UniformRandom(lbound=-0.5,
            ubound=0.5),
        xdensity=20.0,
        y=numbergen.UniformRandom(lbound=-0.5,
            ubound=0.5),
        ydensity=20.0),
    layout_location=(151,520),
    name='LeftRetina',
    nominal_bounds=BoundingBox(points=((-0.75,-0.5),(0.75,0.5))))

Note that there the seed parameter is missing! The seeds are defined in the ty file though:

left_input_pattern = pattern.Line(
    thickness=0.02,
    x=numbergen.UniformRandom(lbound=-0.5,ubound=0.5,seed=12),
    y=numbergen.UniformRandom(lbound=-0.5,ubound=0.5,seed=34),
    orientation=numbergen.UniformRandom(lbound=-pi,ubound=pi,seed=56))

...

topo.sim['LeftRetina'] = sheet.GeneratorSheet(
    input_generator=left_input_pattern,
    nominal_bounds = sheet.BoundingBox(points=((-0.75,-0.5),(0.75,0.5))))

Looking at the code, this is because seed has not been set as a parameter of numbergen.RandomDistribution. As a result script reprs never reproduced random streams correctly. This implies that new the addition of random_generator=random.Random() is equivalent (just as uncontrolled!) but also that the new repr adds nothing useful!

For this reason, I would have preferred to suppress the repr of the random_generator parameter entirely but it looks like the script_repr function doesn't support this right now (although this functionality could probably be added fairly easily).

Imagen patterns already had a random_generator parameter so the situation there is unchanged. Again, I should note that the only reason the scriptrepr test passes for those objects is that the random_generator is generally left at the default value (and therefore not shown). If you set param.parameterized.script_repr_suppress_defaults=False you will get random_generator=<mtrand.RandomState object at 0x7f5b03ea91b0> (or similar) which obviously cannot be evaluated.

In addition, the full script repr also includes time_type=<built-in function mpq> which can't be evaluated either (part of the param.Time object when used in Topographica)!

jlstevens commented 10 years ago

Ok.

Looking at the code in more detail, it seems that returning None from the hook does suppress the repr (as desired). The change has been made in commit ioam/param@ef74a81b1. I am now happy with this solution - just don't bother trying to have a repr for random state as it simply isn't worth it! :-)

This trick to suppress the repr of specific parameters was poorly documented so I've now mentioned it explicitly in the docstring of the script_repr function.

Hopefully that wraps everything up for this particular PR...

jbednar commented 10 years ago

Looks good, thanks!