numenta / nupic.core-legacy

Implementation of core NuPIC algorithms in C++ (under construction)
http://numenta.org
GNU Affero General Public License v3.0
272 stars 276 forks source link

Fix encoder implementation to standards #961

Open breznak opened 8 years ago

breznak commented 8 years ago

the encoders implementation in src/nupic/encoders/ is quite a mess, I guess the PR process was quite relaxed :tongue:


breznak commented 8 years ago

@rhyolight can you eval/assign this?

mrcslws commented 8 years ago

Questions:

So far, I agree with:

breznak commented 8 years ago

Thanks for a quick response, Marcus @mrcslws

Questions: It sounds like you have a strong position against putting encoder Sensors next to their corresponding Encoders. Tentatively, I support the current approach. Otherwise we're polluting the codebase.

Not that strong :smiley: (as there's just 1 encoder now), but I don't see a reason for breaking the common pattern from python-nupic (which I take as standard) and even from current separation in nupic.core: encoders/, regions/, algorithms/ - in my view the ScalarSensor is polluting the code-base; from the point of view of build-system: encoders/ are stand-alone; regions/ depend on other Regions and one on an encoder, now this made encoders depend on regions. All in all, it's 1:1 so unless there are better arguments I'd vote for moving in Reg.

Why should it implement encode? That's not a very C-like function.

I don't know, is it C++-like? :wink: Seriously, for these reasons:

vector<Real> oo(e.getOutputWidth());
e.encodeIntoArray(42, oo.data());

What's wrong with Real64? It's like saying "double".

I think we have a switch whether Real should be interpreted as Real32/Real64 (float/double), so if the code needs double, OK, otherwise why not leaving the more flexible?

The network API uses floats (Real32). The C++ encoders are just the internals of network API Sensors, refactored out of the Sensor. So the encoders should use floats, right?

This is one I feel strongly about. And I think we found the cause of different views: "C++ encoders are just the internals of network API" - No. In my POV the hierarchy from more-core to less-core nupic.core is: {{ {Encoders {SP/TP/TM}} Anomaly/Classifier} Regions/Network} Swarming...}

See #948 , maybe the The network API uses floats (Real32) could use UInts/or some flexible SDR representation object?

In either case, SP+TM use UInt[] as I/O, so you can't avoid the Real->UInt change inside those regions, and having Encoder (Real) only helps you in 1 region but forces others to convert when they want to use it directly ( #889 ).

So far, I agree with: getOutputWidth could return a UInt. Templates might be useful. Though will it make SWIG bindings on the encoders complicated? I haven't really thought this through.

I wouldn't know about the SWIG problems(?), but: other algos in core are templated too, who would use C++-wrappers for encoders? (when there is 1 enc in C++, and ALL in python; encoders are not the performance bottleneck; maybe for other languages?-do we care? )

breznak commented 8 years ago

@mrcslws ping?

rhyolight commented 8 years ago

@breznak @mrcslws I've created subtasks for all these issues. Please continue conversations about them on those issues, not on this super issue. It makes them easier to digest and more topical. Thanks.

breznak commented 8 years ago

:+1: thank you for the work and separating the issues Matt!