numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.34k stars 1.55k forks source link

NuPIC could benefit from separate management of algorithm parameters. #3212

Open cogmission opened 8 years ago

cogmission commented 8 years ago

Creating a Parameters object that can be applied to the initialization of the algorithms as one step in separating the algorithm state from the algorithms proper, should be considered. Benefits are:

  1. This can be done whether or not state is pulled out of the algorithms.
  2. One central place to see all the algorithm parameters.
  3. Parameters can be applied to more than one setup with and then "applied" in cookie stamp fashion without having to manually configure each algorithm instantiation.
  4. More intelligent use of defaults and more flexible application of parameters can be innovated (i.e. getting parameters from distributed locations).
  5. Makes it easier to eventually develop "sanity protocols" for handling parameter expectations and methods that deal with this.
  6. etc. etc. :-P (perusal of HTM.Java's handling of this may be helpful)
rhyolight commented 8 years ago

Does this mean one parameters object at the OPF level for the entire "model"? Or a Parameters object for each thing that demands parameters?

cogmission commented 8 years ago

Good question. The way it's done in HTM.Java is one Parameters object contains the parameters for all algorithms (a full complement of whatever could occupy a Layer or Region); but more than one could be instantiated to initialize a single or subset of algos as needed.

The main point is that the algorithms still contain their variables but get initialized from a central source that can be reused, stored and/or distributed. Storing them in a central place makes them easier to reason about; invoke sanity checking against; establish defaults (type checking, value checking; and bounds checking); store, retrieve and reuse (share).

rhyolight commented 8 years ago

Honestly, I've always disliked this pattern. It is more of an accepted pattern in JavaLand, but not in PythonWorld.

I do agree, however, that the algorithms that require parameters (like SP, TM, encoders, etc) should have some kind of validate() function on them that's expected to be called immediately after construction, which should raise an error if there is a problem.

cogmission commented 8 years ago

Can you tell me why the advantages I listed are not available to or applicable in the python world? How does the difference in languages invalidate the advantages?

I'm just making a suggestion. If it's not something you see as desirable - then that's fine. Although I don't see how manually entering every parameter for every instantiation of an algorithm and storing state in each - could ever be seen as advantageous? Once you get up to the number of parameters we deal with in these modules, there comes a point where their management becomes a thing unto itself. imho

cogmission commented 8 years ago

I do agree, however, that the algorithms that require parameters (like SP, TM, encoders, etc) should have some kind of validate() function on them that's expected to be called immediately after construction, which should raise an error if there is a problem.

Validation is good for containing meltdowns, and then there's sanity checking which is another level above that. ;-) I think we'll need both eventually.

rhyolight commented 8 years ago

It is not that the advantages you listed are untrue in Python, it's that the Python culture is different, that's all. Wrapping a group of parameters into a self-validating object is a very Java thing to do, not so much in Python. But I will address your advantages because I did state that I dislike this pattern.

I dislike it because it adds new Classes, and I'm not sure it's worth it. The one big win for this issue is if we can create a global Parameters object to contain all the model params (for sp, tm, encoders, etc) that can validate itself against all the algorithm configurations. But I'm not sure this can be accomplished without first instantiating the algorithms themselves and running some of their code. This would be inappropriate intimacy of the Parameters class because it must know about the algorithms.

  1. This can be done whether or not state is pulled out of the algorithms.

I don't see this as an advantage.

  1. One central place to see all the algorithm parameters.

Only if we create on global "model params" class, which is only applicable in the OPF. There are so many way to configure a Network, I don't think I'd like to do this at the Network API level. We want to keep the Network API flexible and powerful.

  1. Parameters can be applied to more than one setup with and then "applied" in cookie stamp fashion without having to manually configure each algorithm instantiation.

I can do this easily enough today with a little bit of code.

  1. More intelligent use of defaults and more flexible application of parameters can be innovated (i.e. getting parameters from distributed locations).

I think the algorithm classes should decide what the defaults are. But then I think the algorithm classes should also do the validation, not a params object. And your 2nd point IMO is over-engineering for a problem we don't have.

  1. Makes it easier to eventually develop "sanity protocols" for handling parameter expectations and methods that deal with this.

This would be part of the validation, which IMO should be done in the algorithm classes.


So that's my $0.02, but interested to hear other @numenta/nupic-committers speak up about it.

oxtopus commented 8 years ago

A single parameters object as described comes across (to me) as a leaky abstraction, especially problematic when it comes to defining and handling defaults.

Classes have constructors and well-defined signatures. I doesn't see the need to centralize parameters, or the benefit given the added complexity.

cogmission commented 8 years ago

Wow, I have so many responses I'm not sure really where to start... :-)

For organization sake, I guess I'll start at the top with your response @rhyolight , and then conclude with my response to @oxtopus.

The one big win for this issue is if we can create a global Parameters object to contain all the model params (for sp, tm, encoders, etc) that can validate itself against all the algorithm configurations.

I think you are exaggerating the suggested design here. If you look at Parameters.java what you see on each line is a neat list of parameters; their defaults and (room for) ranges in the case of numeric values. An additional advantage is that one could store a brief explanation or reason statement which would allow immediate feedback to the user as to expected values and sane ranges.

This isn't a proposal for some massive sanity checking engine or framework - it simply has default values, ranges and possibly a description - giving immediate feedback to the user. This data gets applied when the user attempts to set a variable and could possibly invoke a method on the algorithm itself if you wanted to - but what I'm suggesting here is much more simple.

But I'm not sure this can be accomplished without first instantiating the algorithms themselves and running some of their code.

Well you have to instantiate the algorithm before you "apply()" the parameters. This much is expected because the algorithm instance is the thing being configured.

This would be inappropriate intimacy of the Parameters class because it must know about the algorithms.

HTM.Java's Parameters object uses reflection and has no knowledge of the "thing" it's being applied to. This is one of the very cool things about it. When I moved the state out of the algorithms and put it in the Connections object, I did so without any alterations to the Connections object (other than adding the needed fields - but no special handling for runtime assertion of the parameters).

This can be done whether or not state is pulled out of the algorithms.

I don't see this as an advantage.

How not? Maybe I'm not communicating clearly - but ideally I believe the state should be removed from the algorithms - and what I'm saying here is that enhancing the parameter handling doesn't require the state to be removed first? We could enjoy the advantages of intentional parameter handling without needing to go all the way and remove state. That's all I'm saying.

One central place to see all the algorithm parameters.

Only if we create on global "model params" class, which is only applicable in the OPF. There are so many way to configure a Network, I don't think I'd like to do this at the Network API level. We want to keep the Network API flexible and powerful.

In my mind this in no way diminishes flexibility, it adds it, that is the whole reason for this proposal. Having a reusable Parameters object adds to the Network's power making things more flexible without writing custom code to handle parameters or cutting and pasting across code. It builds in a central place to validate parameters and encourages the unification of redundant parameters (of which there are a few between the algorithms and the OPF and Network API) - this would help maintain a discipline where all parameter names have to have a coherency across projects and code modules instead of repeating themselves with slight variations.

Parameters can be applied to more than one setup with and then "applied" in cookie stamp fashion without having to manually configure each algorithm instantiation.

I can do this easily enough today with a little bit of code.

Yep. The point is not to have to :-P And besides, this would help everybody at all different levels and not just those with better familiarity like you and I.

More intelligent use of defaults and more flexible application of parameters can be innovated (i.e. getting parameters from distributed locations).

I think the algorithm classes should decide what the defaults are. But then I think the algorithm classes should also do the validation, not a params object. And your 2nd point IMO is over-engineering for a problem we don't have.

Defaults are application specific and I doubt that a single algorithm can know what is "best" for all occasions. In order to begin to create functionality which allows you to explore your reasoning about these, you need a way to work them in a discrete manner; and once you've done that then you can reapply and reuse (and maybe reify it from serialized state) it across many instantiations of the same algo or across different projects.

And your 2nd point IMO is over-engineering for a problem we don't have.

Over-engineering is if you build something that isn't needed or doesn't increase utility. I'm not saying we have to (or should) distribute anything, but having this makes for one less step if you happen to be inclined in that direction. Frameworks like Flink and Spark maybe could benefit from this, I'm not sure; but I only mentioned it to say that you'd get a certain readiness for this (and any other related functionality) for free.

A single parameters object as described comes across (to me) as a leaky abstraction, especially problematic when it comes to defining and handling defaults.

Classes have constructors and well-defined signatures. I doesn't see the need to centralize parameters, or the benefit given the added complexity.

**From wikipedia:** ...a leaky abstraction is an abstraction that exposes details and limitations of its underlying implementation to its users that should ideally be hidden away. Leaky abstractions are considered problematic, since the purpose of abstractions is to manage complexity by concealing unnecessary details from the user.

Right now, users are forced to deal with tons of parameters. That much is fact. Questions abound all over the place about what each parameter means; how they affect the functionality; and how to best configure them. It's in their faces! :-) In my opinion the management of parameters should move in the direction (as much as possible) toward automation and tinker free management. Having a Parameters object is a healthy step in that direction because it centralizes the place where developers can reason about defaults and their validation - apply it and later maintain and search for parameters. In addition users can reapply a configuration more easily by simply applying a saved parameters object. Moreover, the Parameters object is then a discrete object that can be harnessed in any more widely scoped configuration management utilities one might create in the future.

If you asked an HTM.Java user where to find the default value and its documentation for say STIMULUS_THRESHOLD, they wouldn't have to think, "hmmm.... Is that a SpatialPooler, TemporalMemory, CLAClassifier variable? hmmm... Is the Classifier in the algorithms directory or the research directory so that I can check on what this means?"

No...

They would simply look where the Parameters and their documentation is kept.

I'll give you one guess where? (because one guess is all you need). ;-)

P.S. I'm not going to respond any further on this unless asked so that others can weigh in...

oxtopus commented 8 years ago

At least wrt Python, it would be nice if we could annotate class attrs and constructor arguments as high-level parameters, perhaps even through the use of metaclasses. That would address some of your concerns, falling short of centralizing disparate, algorithm- and implementation-specific parameters into a new obj.