numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.34k stars 1.56k forks source link

GeospatialCoordinateEncoder "altitude" seems optional, but not really #3077

Closed rhyolight closed 8 years ago

rhyolight commented 8 years ago

I ran into difficulties generating anomaly likelihood values with a GeospatialCoordinateEncoder (GCE), and I believe the root cause is that the altitude portion of the raw input vector.

In all current examples of GCE usage, the input vector is created like this:

modelInput = {
    "vector": (speed, longitude, latitude, altitude)
}

... where altitude may be None. This starts breaking when you use the anomaly likelihood post-process, because it introduces arrays with None values, and numpy complains when trying to mean the data:

Traceback (most recent call last):
  File "/Users/mtaylor/Library/Python/2.7/lib/python/site-packages/nupic/algorithms/anomaly_likelihood.py", line 208, in anomalyProbability
    skipRecords=numSkipRecords)
  File "/Users/mtaylor/Library/Python/2.7/lib/python/site-packages/nupic/algorithms/anomaly_likelihood.py", line 364, in estimateAnomalyLikelihoods
    performLowerBoundCheck=False)
  File "/Users/mtaylor/Library/Python/2.7/lib/python/site-packages/nupic/algorithms/anomaly_likelihood.py", line 578, in estimateNormal
    "mean": numpy.mean(sampleData),
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/fromnumeric.py", line 2716, in mean
    out=out, keepdims=keepdims)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/_methods.py", line 62, in _mean
    ret = um.add.reduce(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

I tried using numpy.nanmean but that didn't help. Another workaround is to use 0 for altitude, but this drastically increases processing time by adding another dimension to the input that doesn't really exist.

If you look at how the GCE encodes input data:

altitude = None
if len(inputData) == 4:
  (speed, longitude, latitude, altitude) = inputData
else:
  (speed, longitude, latitude) = inputData
coordinate = self.coordinateForPosition(longitude, latitude, altitude)
radius = self.radiusForSpeed(speed)
super(GeospatialCoordinateEncoder, self).encodeIntoArray(
 (coordinate, radius), output)

You can see that altitude is optionally included in the coordinate. However, the description always returns the altitude:

  def getDescription(self):
    """See `nupic.encoders.base.Encoder` for more information."""
    return [('speed', 0), ('longitude', 1), ('latitude', 2), ('altitude', 3)]

I believe this is the root problem, but I'm not sure how to fix it. To get things running for my current project, I have just manually removed the ('altitude', 3) from the description array, and I can now run NuPIC without an altitude and post-process the anomaly likelihood properly.

rhyolight commented 8 years ago

@subutai @scottpurdy Any chance you guys can take a look at this?

rhyolight commented 8 years ago

BTW, this temporary fix has things working for me locally. So I can proceed with my experiments, but no one else will be able to run them.

scottpurdy commented 8 years ago

@rhyolight - all of the documentation on getDescription (e.g. this) suggests that the list indicates the bit offsets in the encoding. So it looks like the geospatial encoder incorrectly specifies the encoder inputs and rather than removing altitude, you could just have it return [("coordinate", 0)]. Can you try that and see if it works?

rhyolight commented 8 years ago

@scottpurdy I don't understand where the coordinate is coming from. Does it represent the entire vector with speed, lat/lon, and altitude?

When I replace return [('speed', 0), ('longitude', 1), ('latitude', 2), ('altitude', 3)] with return [("coordinate", 0)] I get the following error:

Traceback (most recent call last):
  File "./htm/nupic/process-gpx-nupic.py", line 222, in <module>
    run(inputPath, options.useTimeEncoders, options.scale)
  File "./htm/nupic/process-gpx-nupic.py", line 82, in run
    runOnePoint(point, model, useTimeEncoders, anomalyLikelihood)
  File "./htm/nupic/process-gpx-nupic.py", line 142, in runOnePoint
    result = model.run(modelInput)
  File "/Users/mtaylor/nta/nupic/src/nupic/frameworks/opf/clamodel.py", line 395, in run
    self._sensorCompute(inputRecord)
  File "/Users/mtaylor/nta/nupic/src/nupic/frameworks/opf/clamodel.py", line 480, in _sensorCompute
    sensor.compute()
  File "/Users/mtaylor/nta/nupic/src/nupic/engine/__init__.py", line 458, in compute
    return self._region.compute()
  File "/Users/mtaylor/nta/nupic.core/bindings/py/nupic/bindings/engine_internal.py", line 1393, in compute
    return _engine_internal.Region_compute(self)
  File "/Users/mtaylor/nta/nupic/src/nupic/regions/RecordSensor.py", line 334, in compute
    outputs["sourceOut"][:] = self.encoder.getScalars(data)
ValueError: could not broadcast input array from shape (5) into shape (3)

This is a familiar error, because I got it when I removed the "altitude" component from the input row vector completely:

  modelInput = {
    "vector": (speed, longitude, latitude, altitude)
  }

This is how the input row is created in the geospatial example. When I simply removed the altitude form the tuple, I got this same ValueError from the RecordSensor, but with different shape values. If I implement the change as you suggested, it is going to obviously change the API of the encoder and we'd need to update our example code.

So I guess my question is, how do I change the input vector to adhere to this new input model?

scottpurdy commented 8 years ago

@rhyolight - Ok, so it assumes 1:1 input scalars to output bit arrays. The geospatial encoder is explicitly not intended to run within the OPF (CLAModel). For this use case there are a couple options:

  1. Convert None to 0 for the altitude inside the getScalars function. This is what I would recommend.
  2. Add a constructor argument for whether altitude is used or not. Store this as an instance variable that you can use to determine whether to include altitude in the output of getDescription, getScalars, and anywhere else.
rhyolight commented 8 years ago

I cannot do (1) because it increases the processing time insanely, I've already tried it.

I'll try (2).

rhyolight commented 8 years ago

I cannot do (1) because it increases the processing time insanely, I've already tried it.

I take that back, that's not exactly what I tried. I tried sending 0 as the altitude input in the input vector. I'll try your (1) above and see if it has the same effect on processing time.

rhyolight commented 8 years ago

Thanks @scottpurdy for leading me in the right direction via chat. See https://github.com/numenta/nupic/pull/3082 for fix that doesn't remove any altitude processing capabilities.