manaakiwhenua / dggs-lu-tex

Latex files for paper submitted to Big Earth Data: "Using a DGGS for a scalable, interoperable, and reproducible system of land-use classification"
0 stars 0 forks source link

Overall comment #1

Closed alpha-beta-soup closed 2 months ago

alpha-beta-soup commented 2 months ago

This paper discusses a new workflow to demonstrate how to use DGGS for land-use classification and provides a benchmark to show its performance benefits. Although the idea of DGGS is not new in GIS, there is less focus on the transition from both raster and vector into DGGS and associated performance issues.

My primary concern is that this study didn't provide material about the method. The author mainly provided results from the benchmark without giving the workflow details, especially the algorithms and codes. I noticed that the authors provided the GitHub link in the open science section, but this repository is incomplete, contradicting reproductive science. The paper focuses on the DGGS land use classification, but none of the figures illustrate the DGGS-based land use either. Without seeing the actual data/figure, we cannot judge the performance of this workflow. For many geophysics sciences, we need to ensure data quality; then, we can focus on computational performance.

alpha-beta-soup commented 2 months ago

Thank you for your review.

My primary concern is that this study didn't provide material about the method. The author mainly provided results from the benchmark without giving the workflow details, especially the algorithms and codes. I noticed that the authors provided the GitHub link in the open science section, but this repository is incomplete, contradicting reproductive science.

Our intention is not to propose an interesting or efficient method of converting data from vector or raster data formats to any particular DGGS. Indeed, we use established methods (e.g. polyfill, see also #17) for conversion. Rather we aimed to establish a defensible but artificial benchmarking exercise that represents equivalent land use classification workflows, and measure the performance of each to come to a conclusion about whether DGGS offers a computation advantage for land use classification (hypothesis: that DGGS is more efficient), while also justifying this approach from a GISscience perspective (e.g. that it is better to use DGGS as it is better suited for representing known spatial precision)

We're not entirely sure what is meant by "incomplete" in this context, but we accept that we could add better documentation and environments for running the benchmarks independently. As the submitted scripts are Jupyter Notebooks, all executed code and the results of that execution (e.g. timing) is contained within the notebooks themselves and does not require independent execution (though this is of course possible).

The paper focuses on the DGGS land use classification, but none of the figures illustrate the DGGS-based land use either. Without seeing the actual data/figure, we cannot judge the performance of this workflow. For many geophysics sciences, we need to ensure data quality; then, we can focus on computational performance.

Our benchmark experiments are deliberately artificial. The classification rules implement symbolic classification rules. We note this in a footnote, but perhaps should elevate this into the main body of text to avoid confusion:

Vector

We generated 500 random vector coverages, using a random distribution of points over a fixed extent, and calculated Voronoi polygons for each case. Each polygon in each coverage was randomly assigned a 0 or 1 value and then dissolved accordingly. These data were then spatially joined (union), and a (nonsense) map classification logic was applied to the unioned output. This classification required the Boolean value of all overlapping features to be summed, then that value was used as input for a series of functions that return a Boolean value. These functions are meaningless for land-use mapping, but are instructive in this case because they require at least some form of computation to mimic a meaningful map classification.\footnote{These functions determine: whether that sum is a prime number; a perfect number; a triangular, square, pentagonal, or hexagonal number; or a Fibonacci number.} Each unique combination of the eight resultant Boolean values was then considered a distinct "class" akin to a distinct land-use type.

The output of this looks a bit like an image of random noise, and is not meaningful. We wanted to generate examples using random processes for several reasons:

  1. No encumberance using or sharing the randomly-generate data vis-à-vis real input data for a land use map.
  2. Ease of reproduction (using a particular random seed).
  3. The ability to easily vary the number of inputs when measuring.
  4. Each input is equivalent to each other input.

Actions

alpha-beta-soup commented 2 months ago

@ChocopieKewpie I'll assign you as a reminder for the additional documentation on the benchmarking repository.

alpha-beta-soup commented 2 months ago

Documentation for benchmarks extended and organised in commits at https://github.com/ChocopieKewpie/dggsBenchmarks/:

ChocopieKewpie commented 2 months ago

Not quite done yet- currently just vector benchmarks. Raster benchmark will soon follow.

ChocopieKewpie commented 2 months ago

extended Raster: https://github.com/ChocopieKewpie/dggsBenchmarks/commit/9a9d359435ecac425b51dcc131af16b72cbdde55