uber / h3-java

Java bindings for H3, a hierarchical hexagonal geospatial indexing system
https://uber.github.io/h3/
Apache License 2.0
282 stars 54 forks source link

H3Core.nonZeroLongArrayToList takes too much memory #68

Open REASY opened 4 years ago

REASY commented 4 years ago

Trying to polyfill big area with resolution 11. I can see that internally algorithm prepared result https://github.com/uber/h3-java/blob/3a1e9bc9bcd1d62e9e8d4361230fe47bb5395242/src/main/java/com/uber/h3core/H3Core.java#L691 but when it tries to copy it to List it gets out of memory on this line: https://github.com/uber/h3-java/blob/a500880eef493dca44903bf7ad0e249aed5b85b1/src/main/java/com/uber/h3core/H3Core.java#L1242-L1254

As you can see in the stacktrace, result is ready, but it just tries to copy it to ArrayList: image

Memory, the arrow shows to the place when nonZeroLongArrayToList started : image

Scala code:

import scala.collection.JavaConverters._
import java.util.{Collections => JCollections}

import beam.utils.ProfilingUtils
import com.uber.h3core.AreaUnit
import com.uber.h3core.util.GeoCoord

def main(args: Array[String]): Unit = {

  val h3Core = com.uber.h3core.H3Core.newInstance

  val xMin = -106.645646
  val xMax = -93.508292
  val yMin = 25.837377
  val yMax = 36.500704

  val rectangle = java.util.Arrays.asList(
    new GeoCoord(yMin, xMin),
    new GeoCoord(yMax, xMin),
    new GeoCoord(yMax, xMax),
    new GeoCoord(yMin, xMax),
  )

  val holes = java.util.Collections.emptyList[java.util.List[GeoCoord]]()
  val resolution = 11
  val hexes = h3Core.polyfillAddress(rectangle, holes, resolution)
  println(s"Generated ${hexes.size()}")
}

The easiest fix is just to get the total number of non-zero elements and allocate an array with that size and copy elements over. Other solution can be moving all zero elements in the original array to the end of the array and wrap it by ArrayList via Arrays.asList(array).subList(index, IDX_OF_FIRST_ZERO);. I can create PR if this sounds good.

Thanks.

isaacbrodsky commented 4 years ago

Hi, sorry for the delayed response! Yes, I think counting the number of non-zero indexes is a reasonable change. Off hand I don't know the performance of the other option, but I assume you want to make sure it does not keep the entire result array alive. Alternately we could do something like allowing the ArrayList to dynamically resize, but I suspect that would always be slower.

I'm curious if the result array itself already takes up roughly 8GB memory - is that being shown in your memory profiler with the 8GB of memory already being used?

sebisteiner commented 2 years ago

Hi @isaacbrodsky

I just stumbled across this issue when trying to calculate all H3 indexes on a 2.5km long street. (red line in the image at the bottom)

maxPolyfillSize returns 222'668'225. But the actual number of non-zero elements is 4'889. So the memory overhead is huge 🥲

What's your preferred solution for the problem? I cannot really make a statement on which one's the best regarding performance and memory consumption.

One could also solve it using streams:

return Arrays.stream(out)
        .filter(idx -> idx > 0)
        .boxed()
        .collect(Collectors.toList());

In terms of readability I'd probably prefer this solution.

CleanShot 2022-10-04 at 15 19 29

sebisteiner commented 2 years ago

In my case the size of the ArrayList is only one small part of the problem. The actual problem is, that the number returned by maxPolyfillSize is way too big. I opened another issue here: https://github.com/uber/h3/issues/708