Fix encoder implementation to standards

breznak commented 8 years ago

the encoders implementation in src/nupic/encoders/ is quite a mess, I guess the PR process was quite relaxed :tongue:

breznak commented 8 years ago

@rhyolight can you eval/assign this?

mrcslws commented 8 years ago

Questions:

It sounds like you have a strong position against putting encoder Sensors next to their corresponding Encoders. Tentatively, I support the current approach. Otherwise we're polluting the codebase.
Why should it implement encode? That's not a very C-like function.
What's wrong with Real64? It's like saying "double".
The network API uses floats (Real32). The C++ encoders are just the internals of network API Sensors, refactored out of the Sensor. So the encoders should use floats, right?

So far, I agree with:

getOutputWidth could return a UInt.
Templates might be useful.
- Though will it make SWIG bindings on the encoders complicated? I haven't really thought this through.

breznak commented 8 years ago

Thanks for a quick response, Marcus @mrcslws

Questions: It sounds like you have a strong position against putting encoder Sensors next to their corresponding Encoders. Tentatively, I support the current approach. Otherwise we're polluting the codebase.

Not that strong :smiley: (as there's just 1 encoder now), but I don't see a reason for breaking the common pattern from python-nupic (which I take as standard) and even from current separation in nupic.core: encoders/, regions/, algorithms/ - in my view the ScalarSensor is polluting the code-base; from the point of view of build-system: encoders/ are stand-alone; regions/ depend on other Regions and one on an encoder, now this made encoders depend on regions. All in all, it's 1:1 so unless there are better arguments I'd vote for moving in Reg.

Why should it implement encode? That's not a very C-like function.

I don't know, is it C++-like? :wink: Seriously, for these reasons:

python-Encoder defines encode() , not encodeIntoArray() which is just implementation detail (but used everywhere) - so API
convenience and safety: as a user I'd really expect to feed an encoder a value and get bit-vector output.
- why would I have to allocate an array, know its type and size (I know it's common in C, but that does not mean the best), when I can get the correct data by return-value? Compare the knowledge required (and space for bugs) in: auto result = e.encode(42); vs

vector<Real> oo(e.getOutputWidth());
e.encodeIntoArray(42, oo.data());

you might argue the speed, as encode() allocates a new array each call; but that's questionalble, as you have to clear your array somehow too in encodeIntoArray, just have to remember it. And it could be worked on, eg. EncoderBase could hold a member private UInt output[size] and we'd reuse it,... And still, encodeIntoArray() is there for those who need it.

What's wrong with Real64? It's like saying "double".

I think we have a switch whether Real should be interpreted as Real32/Real64 (float/double), so if the code needs double, OK, otherwise why not leaving the more flexible?

The network API uses floats (Real32). The C++ encoders are just the internals of network API Sensors, refactored out of the Sensor. So the encoders should use floats, right?

This is one I feel strongly about. And I think we found the cause of different views: "C++ encoders are just the internals of network API" - No. In my POV the hierarchy from more-core to less-core nupic.core is: {{ {Encoders {SP/TP/TM}} Anomaly/Classifier} Regions/Network} Swarming...}

See #948 , maybe the The network API uses ~~floats (Real32)~~ could use UInts/or some flexible SDR representation object?

In either case, SP+TM use UInt[] as I/O, so you can't avoid the Real->UInt change inside those regions, and having Encoder (Real) only helps you in 1 region but forces others to convert when they want to use it directly ( #889 ).

So far, I agree with: getOutputWidth could return a UInt. Templates might be useful. Though will it make SWIG bindings on the encoders complicated? I haven't really thought this through.

I wouldn't know about the SWIG problems(?), but: other algos in core are templated too, who would use C++-wrappers for encoders? (when there is 1 enc in C++, and ALL in python; encoders are not the performance bottleneck; maybe for other languages?-do we care? )

breznak commented 8 years ago

@mrcslws ping?

rhyolight commented 8 years ago

@breznak @mrcslws I've created subtasks for all these issues. Please continue conversations about them on those issues, not on this super issue. It makes them easier to digest and more topical. Thanks.

breznak commented 8 years ago

:+1: thank you for the work and separating the issues Matt!

numenta / nupic.core-legacy

Fix encoder implementation to standards #961