mzagorskirs / geowave

GeoWave provides geospatial and temporal indexing on top of Accumulo, HBase, BigTable, Cassandra, Kudu, Redis, RocksDB, and DynamoDB.
Apache License 2.0
0 stars 0 forks source link

Task 5: Develop GeoWave heatmap process using aggregation spatial binning #9

Closed mzagorskirs closed 2 years ago

mzagorskirs commented 2 years ago

Epic: https://github.com/mzagorskirs/geowave/issues/3

Note: stick to either aggregations or statistics, and do not tackle both in the initial work:

  1. (not part of the heatmap process, but necessary prior step) Data is ingested into GeoWave DataStore (and for statistics appropriate statistics are associated with the data).
  2. in the heatmap process, use invertQuery to add hints that will allow the geowave query to use the statistic or aggregation required (hints are a way to "hint" the query code to do something different besides the typical SimpleFeatureCollection)
    • see Subsample or DistributedRender processes for examples of using hints to trigger "special" geowave queries
  3. similar to what InternalDistributedRenderProcess does, the SimpleFeatureCollection can be an "internal" class/object type which describes sufficiently the result so that a GridCoverage can be rendered
  4. render to a grid coverage and return GridCoverage2D for background, here's an example of creating grid coverage from a given BufferedImage and spatial envelope: final GridCoverageFactory gcf = CoverageFactoryFinder.getGridCoverageFactory(GeoTools.getDefaultHints()); final GridCoverage2D gridCov = gcf.create("Process Results", , );
    • things that we'll leave out of scope to minimize complexity for now is that we'll stick to a single precision geohash (let's say 4) but we could otherwise optimize in the future based on the pixel/degrees ratio of the map request also, we will just be using a constant weight of 1 instead of applying a weight based on attribute values (so basically just a count)
    • we do use a kernel which distributes that weight of 1 across a kernel, it will be best to re-use code for generating the heatmap in the end, but again I think to minimize complexity this iteration just get to the point that you can count centroids per pixel (I think the basic mechanics, without getting deep into the heatmap generation)
    • GaussianFilter.incrementPt(centroidLatitude, centroidLongitude, , widthInPixels, heightInPixels, weight, new ValueRange[] {new ValueRange(-180, 180), new ValueRange(-90, 90)})` is a way to get the cell counts with that "kernel" I had mentioned, but again I think that potentially adds a little more complexity than just originally counting without using the kernel (basic approach, which grid cell the centroid intersects), but maybe you're comfortable just using that gaussian filter for the CellCounter instead of worrying about any other logic
    • lastly, what we will ultimately need is a Histogram of the counts to get estimates to be able to apply a quantile color ramp, but out of scope, just applying a color ramp to the raw weights of the output will be sufficient for this initial implementation
mzagorskirs commented 2 years ago

For this task, I developed an aggregation query to sum the hail sizes per GeoHash precision level 4 cell. The resulting heatmap looks the same as the original (non-aggregated heatmap), as expected, however, only 1556 features were called by GeoWaveFeatureReader instead of all 13,742 features in the hail dataset. So, the aggregation requires fewer calls, however, there is a bit more processing involved to build the aggregation query and convert the results to SimpleFeatures. The overall runtime is not necessarily shorter for the aggregation heatmap, but the network load would be reduced:

image

mzagorskirs commented 2 years ago

Also, for this task, I developed a count metric and created a heatmap for it. The heatmap looks similar to the one above, as expected. This is the benefit of downscaling: similar resulting heatmap with fewer points and reduced load on networks.

image

mzagorskirs commented 2 years ago

Zooming in on the aggregate heatmaps reveals the regular grid pattern of the GeoHash centroid:

image

mzagorskirs commented 2 years ago

And, zooming in a bit more really reveals the regular grid pattern of the GeoHash centroid (where the data has been aggregated): (This is GeoHash precision level 4) image

mzagorskirs commented 2 years ago

Whereas, for comparison, the original heatmap (non-aggregated data) zoomed-in:

image

mzagorskirs commented 2 years ago

I verified the accuracy of both the field sum and field count aggregation results, for example, the count of points within this GeoHash precision level 4 cell is 2, as shown by the attribute on the centroid point of the GeoHash centroid:

image

mzagorskirs commented 2 years ago

Task demo'd and approved.

rfecher commented 2 years ago

approved