isamplesorg / isamples_inabox

Provides functionality intermediate to a collection and central
0 stars 1 forks source link

Add solr fields for different H3 resolutions #235

Closed datadavev closed 1 year ago

datadavev commented 1 year ago

It would be beneficial to rapidly compute the spatial distribution of records at different resolutions. This could be used for example, to augment the spatial display in the UI which is currently very slow when viewing broad regions (since it is dependent on streaming individual records) and limited to a reasonable number (e.g. 50k) records in a view.

A spatial distribution / heatmap can be very efficiently computed by faceting on the h3 values, however this requires that the h3 strings are available at specific resolutions (i.e. length of the h3 string) since solr is not able to compute facets on the first n chars of a string. We currently store only full resolution h3 in producedBy_samplingSite_location_h3, e.g. 8f893424431b471

Task here is to add fields for the h3 resolutions 1 through 10, which will enable rapid distribution display at resolutions from global down to a few km2.

The requirements for these fields are: facet on field, search on field. The Solr docs suggestdocValues=true, indexed=false, stored=false, multiValued=false. e.g.:

<field name="producedBy_samplingSite_location_h3_1" type="string" indexed="false" stored="false" docvalues="true />

Fields can be named like the full resolution, but with a suffix _n where n is 1, 2, ... 10. The resulting field producedBy_samplingSite_location_h3_1 would have two characters, e.g. 8f from the example above.

These fields could be populated as copyFields, e.g.:

<copyField source="producedBy_samplingSite_location_h3" dest="producedBy_samplingSite_location_h3_1" maxChars=2" />
<copyField source="producedBy_samplingSite_location_h3" dest="producedBy_samplingSite_location_h3_2" maxChars=3" />
datadavev commented 1 year ago

So... I got myself confused between geohash and h3 hashes. With a geohash the shorter string approach works, but that is not the case for h3 hashes. Instead it is necessary to compute the h3 value at the desired resolutions. It's a simple correction, but means the copyfield approach won't work. I'll take the changes thus far and refactor for computing the hash values.

datadavev commented 1 year ago

After experimenting with viewing and navigating the h3 cells at different resolutions, it appears it will be beneficial to compute all the cell resolutions (0 through 15).