htm-community / htm.core

Actively developed Hierarchical Temporal Memory (HTM) community fork (continuation) of NuPIC. Implementation for C++ and Python
http://numenta.org
GNU Affero General Public License v3.0
150 stars 74 forks source link

Merge nupic.py into nupic.cpp #216

Closed ctrl-z-9000-times closed 4 years ago

ctrl-z-9000-times commented 5 years ago

We should take the best from both repo's and discard the duplicate code which we don't want. For python compatibility we can somehow link/redirect the pure python module nupic to the C++ bindings module nupic.bindings, that way the python API should be mostly unchanged.

Nupic.py has a lot of unit tests, there should be enough to verify the API compatibility.

The python code which does not have a C++ equivalent can live in this repo, under nupic.cpp/bindings/py.

Related:

Is this a good strategy for nupic.cpp? An acceptable fate for nupic.py?
@dkeeney & @breznak ?

dkeeney commented 5 years ago

Conceptually that might make since but consider our audience.

For many, having a cleaner faster implementation is not the goal. So to me, I think we must keep both the entire Python base and the C++ base. Removing the duplicate Python in favor of C++ is not going to help advance Nupic.

Should they both reside in the same repository? I don't know. I would think not. I think it would be better for nupic.cpp to contain only the Python it needs in order to test the API interface.

ctrl-z-9000-times commented 5 years ago

Removing the duplicate Python in favor of C++ is not going to help advance Nupic.

I intend to make all of my advancements to nupic.cpp and not to port them to nupic.py

These are mostly Python people. There will be some that prefer a pure Python environment.

This should be less of an issue once nupic.cpp is crossplatform, has fewer dependencies, and uses a modern c++ version.

ctrl-z-9000-times commented 5 years ago

For many, having a cleaner implementation is not the goal.

But it would make maintaining & improving it a lot easier.

ctrl-z-9000-times commented 5 years ago

Another advantage of having a single repo is continuous integration for the bindings. Currently many of the tests for the bindings are in nupic.py. Some of these should probably be moved into nupic.cpp regardless what we do with the rest of the stuff. Especially if we keep both implementations of the duplicate algorithms: then we should get CI for the tests which verify that the duplicates are same.

ctrl-z-9000-times commented 5 years ago
  • These are mostly Python people. There will be some that prefer a pure Python environment.

If we keep all of the python code and just move it to this repo, then they can have a complete python only environment. Maybe we could add a flag to the setup.py script to only build & include the C++ if requested? This would be very similar to what we have now with the C++ side conditionally including the Python bindings.

dkeeney commented 5 years ago

Unfortunately, not all of the .py code is in the nupic.py registry. There is the htmresearch repo and many people have their own private repositories that rely on nupic.core. So if the objective of putting the repositories together is to allow us to modify the API then it will not work. We will not be able to control all of the code. That is why the API is so critical.

Also, as @breznak pointed out, one of the advantages of having a somewhat similar structure to that of numenta's repositories is that when they change there code it is not too hard to apply those changes to our code.

ctrl-z-9000-times commented 5 years ago

Reopening now that we have the buy-in of the community.

ctrl-z-9000-times commented 5 years ago

Tasks: 1) Decide on a file location for the python code.

2) Merge the unit tests for the bindings, which are currently all in python repo. 3) Merge the python code.

breznak commented 5 years ago

Merge the unit tests for the bindings, which are currently all in python repo.

I'd start with bringing over the tests that'll go to nupic.cpp/bindings/py/src/tests/ (the tests will be removed here completely, or they test also py-only fctionality?)

Decide on a file location for the python code.

Will the Py code make sense only with the bindings? Because the bindings are optional, so maybe

ctrl-z-9000-times commented 5 years ago

Will the Py code make sense only with the bindings?

Currently the python code works stand-alone. If we remove any of the duplicate algorithms then it will become dependant on the C++ code.

dkeeney commented 5 years ago

I would vote for py/nupic as a location for the python code. This is definitely not 'bindings'. At some point I still want to add C# code into the mix with its own bindings. That can go in as cs/nupic for apps written in C# and bindings/cs for its bindings into the C++ library.

breznak commented 5 years ago

Currently the python code works stand-alone.

what should happen with the nupic.py repo? If I understand, cpp+py code will be merged here. Maybe I'd want nupic.py to become opposite fork: pure python nupic, any bindings, etc would be removed. (not for speed, but ease of use, learning, prototyping)

dkeeney commented 5 years ago

If we combine them then nupic.py should go away. Maybe nupic.cpp is no longer an appropriate name? maybe nupic.community or something with a broader scope.

breznak commented 5 years ago

nupic.bindings?

then nupic.py should go away.

there won't be pure python option in our py/nupic, would it? (just py, no dependencies, no backends...)

dkeeney commented 5 years ago

I would think that merging the repositories means just that.... Both pure python and python using the backend would be there. It would be nice to separate them but we don't want to spend a lot of time re-designing the python. Right now both are there.

dkeeney commented 5 years ago

I have to take a break and go do my taxes...should not take long since I have no income :)

ctrl-z-9000-times commented 5 years ago

I would think that merging the repositories means just that.... Both pure python and python using the backend would be there.

Yes. The purpose of merging the repo's is not to reduce the amount of code we have, but rather to make it easier test & use all of it.

breznak commented 5 years ago

is not to reduce the amount of code we have,

I thought you want python to supplement missinge parts from cpp. Full new language would need a lot of manpower.

Another important thing, how should differences in implementations be treated? (that has always been problem). Like when you've developed extra Winners&predictive cells for TM. would we allow that only if both versions are developed (that is a strong limitation). That's why I'd prefer only 1 language in this repo, be it cpp for core, meet at NetworkAPI and supplementary code only in Py (and if gets rewritten to c++, py can be removed). PYthon version of course could live somewhere else, but I think it should be independently developed.

dkeeney commented 5 years ago

I am with @ctrl-z-9000-times , create a py folder under the nupic.cpp repository and drop all of nupic.py into it. Maybe clean out anything that is obviously junk. Then port that .py code to Python3. We might want to think about getting it directly from numenta's nupic repository since that might be more up-to-date than our nupic.py repository.

I think that is a lot of work, but it is doable.

ctrl-z-9000-times commented 5 years ago

In the user survey, people responded:

And Jose_Cueto writes:

Even without python versions of these algorithms (no c++ just py bindings) I’m ok with it as long as we don’t lose the ease of introspecting python objects in python code.

We can almost do this with the python bindings. What we can't do is allow the user to override C++ class methods.

breznak commented 5 years ago

ok, I'm fine with that.

This is what I'm worried about

Another important thing, how should differences in implementations be treated? (that has always been problem). Like when you've developed extra Winners&predictive cells for TM. would we allow that only if both versions are developed (that is a strong limitation). That's why I'd prefer only 1 language in this repo, be it cpp for core, meet at NetworkAPI and supplementary code only in Py (and if gets rewritten to c++, py can be removed). PYthon version of course could live somewhere else, but I think it should be independently developed.

I'm :+1: for the tighter integration the merge brings, I'll be in a way easier to develop, knowing it's properly tested and once your changes pass, it'll work with py/cpp well.

Duplicit implementations is what bothers me how we'd handle. Maybe we can say c++ is mainly for the heavy lifting and extras stay in c++? But there was a number of people who prefered cpp environment.

ctrl-z-9000-times commented 5 years ago

I see two strategies for merging:

OR

ctrl-z-9000-times commented 5 years ago

PS: I think we should merge the second way, one piece at a time.

dkeeney commented 5 years ago

I think so as well; one piece at a time. Although it is tempting to just dump in all in and say we are done :)

One thing I noticed in the .py code for the TM. We have their one .py TMRegion (or maybe its RegionTM, don't remember) which then branches to 6 flavors of TM; with and without the Cpp backend, BacktrackingTM vs TM via the Shim, with and without monitoring. When I ported it to C++ I did two separate region modules BacktrackingTMRegion and TMRegion which only handles the C++ version of esch. So there is a structural difference. Now that we have TMRegion in C++, we can remove the part of the .py region that calls the Cpp versions. We don't need two ways to execute the same algorithm in C++.

It is this type of thing that needs to be cleaned up as we migrate .py code to our repository.

ctrl-z-9000-times commented 5 years ago

@dkeeney,

I was looking through the python repository and I came up with a list of all of the NetworkAPI related tests. These tests should ensure that the NetworkAPI is compatible with users preexisting experiments which use Numenta/nupic.

These tests will need fixing before they're useful:

nupic.py/tests/unit/nupic/engine/network_test.py
nupic.py/tests/unit/nupic/engine/unified_py_parameter_test.py
nupic.py/tests/unit/nupic/engine/syntactic_sugar_test.py
nupic.py/tests/unit/nupic/regions/sdr_classifier_region_test.py
nupic.py/tests/unit/nupic/regions/knn_anomaly_classifier_region_test.py
nupic.py/tests/unit/nupic/regions/anomaly_region_test.py
nupic.py/tests/unit/nupic/regions/knn_classifier_region_test.py
nupic.py/tests/unit/nupic/regions/record_sensor_region_test.py
nupic.py/tests/unit/nupic/regions/anomaly_likelihood_region_test.py
nupic.py/tests/unit/nupic/regions/regions_spec_test.py
nupic.py/tests/unit/nupic/regions/tm_region_test.py
nupic.py/tests/integration/nupic/engine/network_checkpoint_test.py
nupic.py/tests/integration/nupic/engine/network_twonode_test.py
nupic.py/tests/integration/nupic/engine/temporal_memory_compatibility_test.py
nupic.py/tests/integration/nupic/engine/network_testnode_interchangeability.py
nupic.py/tests/integration/nupic/engine/network_creation_common.py
nupic.py/tests/integration/nupic/engine/vector_file_sensor_test.py
nupic.py/tests/integration/nupic/regions/multiclass_knn_test.py
nupic.py/tests/integration/nupic/regions/single_step_sdr_classifier_test.py
dkeeney commented 5 years ago

Wonderful! This will make a good starting point. Thanks.

ctrl-z-9000-times commented 5 years ago

Python SpatialPooler Issues

PR #169 removes the Sparse Matrix libraries which the python implementation of the spatial pooler depends on. If / when the python SP is made to work with nupic, it will need to be converted to use the connections class.

dkeeney commented 5 years ago

So. Where is the .py code in nupic.cpp?

Is it still all in the nupic.py branch?

ctrl-z-9000-times commented 5 years ago

The nupic.py branch is still there, but the nupic.py repository has progressed further than that branch.

Most of the currently working python code is in the bindings.

ctrl-z-9000-times commented 5 years ago

Notes:

I'm hoping to fix up and merge the following areas.

breznak commented 4 years ago

Closing as the further merge likely won't happen, as we have extended python bindings that provide same (or more) functionality than the nupic.py. Closing, please reopen if you feel some more work is to be done here.