numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.34k stars 1.56k forks source link

Strip down, separate NuPIC in submodules #430

Closed breznak closed 10 years ago

breznak commented 10 years ago

This touches the "API" and "C++ core NuPIC" ideas. And other things as well. To keep the codebase nice and low, we should consider using git submodules (=linked separate repos) for non-essential parts?

Ideal candidates are examples, benchmarks and encoders! I'm fine with having those current in core, but we should and probably would add more as nupic spreads to more applications - encoders for image, video, sound, ...

Here I'd like to propose creating a numenta@nupic-encoders repo which will check out to $NUPIC/py/nupic/encoders/custom. This new repo then would have structure like: "vision", "NLP", "audio", "robots", "net", ...

Last but not least, defining these non-critical parts will not only trim down the codebase, but speed up the development and ease pressure on maintainers. Because more people could have "review" rights. It still should use same ML, Issue tracker,Travis,... (although eg there might not be need for the CLA (=agreement) if it were trouble for sb., but I guess it's not now)

breznak commented 10 years ago

Note: I've never used git submodules, so we'll need sb experienced to weight in. I've seen @chetan51 to use them in his repo..

breznak commented 10 years ago

some criticism: http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/

breznak commented 10 years ago

Alternative: git subtree : https://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/

I skimmed it and what I think I got: submodule=always included, subtree=optionaly merged in.

So, this could bring the separation to the new levels (and usefulness):

...I think these are good ideas, but maybe too extreme at once. We should start with the encoders/custom as a proof-of concept.

sjmackenzie commented 10 years ago

One or two submodules is fine. Using submodules too much makes development a bit of a pain especially when commits traverse many submodules. But the proposition of nupic-core is more than fine.

subutai commented 10 years ago

-1 on git submodules. We've tried using it before and it is a pain to get working correctly. It causes many more problems than it solves. Similarly I'm not sure we need subtrees either. I prefer to keep any nupic-core repo simple, clean, and pure.

sjmackenzie commented 10 years ago

Interesting, we made the mistake of exploding repos on mozart2 then contracted back to 1 repo with one submodule containing stdlibs. So far nothing to complain about.... but stdlibs are so far detatcted from the core of mozart2 whereas nupic-core is the centre. So commits could very easily span nupic and nupic-core.

If it was a technical issue then I can help:

You need a file in nupic called .gitmodules at nupic root and in there you point it to nupic-core that's pretty much it.

oxtopus commented 10 years ago

I've used both submodule and subtree and would recommend neither.

I think the approach of establishing semantic versioning is sufficient. Coupling one repository (and commit history) to another defeats the purpose of maintaining API compatibility between projects. If we're to use submodule or subtree, we might as well keep everything in one repository (regardless of semantic versioning).

subutai commented 10 years ago

I agree with @oxtopus. The whole point of separating these out is to keep things semantically and physically distinct. Once we have a pure C++ repo, why should CPP developers have to worry about Python, or Java, or .NET?

sjmackenzie commented 10 years ago

So what is the solution?

One repo multiple build and install paths... plus make it multi-platform?

Wouldn't the work to achieve this outweigh a single submodule?

subutai commented 10 years ago

What if we had separate repos (per your previous email) but with very clear distinctions? Maybe something like:

nupic-core - pure C++, build.sh script for all supported platforms. (maybe with cmake?) Small and lightweight. The output of this is one libcla library and test executables. Everyone is dependent on this library, sort of like libstd

nupic-py - python bindings and generic python modules. Build.sh script to build it. Dependent on nupic-core. Maybe the build script even assumes nupic-core is in a parallel directory(?). A C++ or Java developer wouldn't have to worry about this repo at all.

nupic-py-opf : OPF, pure python. No build script.

nupic-java - Java bindings and generic java modules. Standard Java build script to build it. Depends on nupic-core. A Python developer wouldn't need to build it, worry about this repo or even see it.

This will be some work but maybe it's cleaner and more extensible??

breznak commented 10 years ago

Hi,

On Wed, Nov 20, 2013 at 6:42 PM, Subutai Ahmad notifications@github.comwrote:

What if we had separate repos (per your previous email) but with very clear distinctions? Maybe something like:

nupic-core - pure C++, build.sh script for all supported platforms. (maybe with cmake?) Small and lightweight. The output of this is one libcla library and test executables. Everyone is dependent on this library, sort of like libstd

+1000, do it even now, even if it's not standalone-usable yet. It will help separate the concepts and test how we can handle that. It's already almost done, I think..Steward claims to have nta/ separated with git history (important!). And @david-ragazzi has a code sitting in a PR to build with cmake. (The cmake is failing on sth swig-python related, so it should be fine here)

nupic-py - python bindings and generic python modules. Build.sh script to build it. Dependent on nupic-core. Maybe the build script even assumes nupic-core is in a parallel directory(?). A C++ or Java developer wouldn't have to worry about this repo at all.

nupic-py-opf : OPF, pure python. No build script.

nupic-java - Java bindings and generic java modules. Standard Java build script to build it. Depends on nupic-core. A Python developer wouldn't need to build it, worry about this repo or even see it.

This will be some work but maybe it's cleaner and more extensible??

I like this. I see one possible problem: git checkout wouldnt' give you a buildable snapshot from history (think git bisect, but who does it with nupic..)

— Reply to this email directly or view it on GitHubhttps://github.com/numenta/nupic/issues/430#issuecomment-28911138 .

Marek Otahal :o)

rhyolight commented 10 years ago

@breznak Closing this issue now that we're actively working on the core extraction. If there are more specific "modules" within either repo that you think should be created, please bring the topic up on the nupic-hackers mailing list.