Closed jbouffard closed 6 years ago
This looks as expected. For me, the most difficult parameter to use is 'bits' and it's also not entirely clear to me how it relates to num_partitions. Of course, this parameter also exists in plain geotrellis, so it makes sense to also expose it here.
@jdries The bits
parameter is kind of confusing, and it's something we should explain better in the documentation.
In order to determine what partition an element belongs to, we first calculate its z-value
using a Z-order curve which in this case is the interleaving of the binary values of the col
, row
, and instant
of the Tile
. This z-value
represents an area that is unique to those values. However, this is a problem as we'd like to group values who are close to each other both spatially and temporally in the same partition, and it's difficult to achieve this when all of the z-value
s are unique. Thus, we perform a right shift on the resulting z-value
https://github.com/locationtech-labs/geopyspark/blob/master/geopyspark-backend/geotrellis/src/main/scala/geopyspark/geotrellis/SpatialPartitioner.scala#L21
By dropping N number of bits from each z-value
, we can create larger groupings of values since those elements that are near each other will have their remaining bits in common. Therefore, the larger bits
is, the larger the groupings of values will be. bits
controls how many elements will (ideally) be in each partition.
The shifted z-index
is used to assign its associated value to a partition based on num_partitions
by doing z-index % num_partitions
. Which will try to distribute groupings evenly between the partitions.
I hope I was able to answer your question, but please let me know if it wasn't clear. @echeipesh Do you have anything you'd like to add?
@jpolchlo I updated the docstring. How does it look now?
I feel as if there are more words there, but I'm only getting a slightly better understanding of what the parameter is there for. I'm a neophyte in regards to spatial partitioning, but not so dumb that I shouldn't be able figure it out with a few good sentences. Is there a way to describe why I would change the bits
parameter? What would the tradeoffs be? Can you impart any intuition about this? (Viz. the connection between the z-curve and the quadtree?)
the connection between the z-curve and the quadtree?
I am typing this without looking at the underlying code in question, but typically you would expect one bit per spatial dimension per division. In the 2D case, the number of bits divided by two is the number of divisions (levels of the quadtree) that are payed-attention-to.
I'm having some trouble explaining what bits
does in a concise manner, so I'm thinking about just merging this PR and then make an issue for it. Would that be okay with you @jpolchlo?
It might also be good to make the same issue in GT since I have to yet find any documentation there that explains what bits
does well.
I'm ok with that. Make an issue!
This PR adds the
SpaceTimePartitionStrategy
to to the list ofPartitionStrategies
. This will allow for better partitioning ofSPACETIME
layers. TheSpaceTimePartitioner
will try to sort each element of the layer based on its spatial location and its time. Temporal sorting can be controlled both by a unit of time (ie.DAYS
,MONTHS
, etc) and by a resolution (ie. 1 day, 3 weeks, etc).This PR resolves #642