`get_sample_networks` erroneously producing an empty network

r-barnes / faster-unmixer

A faster implementation of the sediment inverse/unmixing scheme proposed in Lipp et al (2021).

5 stars 1 forks source link

`get_sample_networks` erroneously producing an empty network #30

Closed AlexLipp closed 1 year ago

AlexLipp commented 1 year ago

When analysing the following synthetic D8 raster and sample-sites, get_sample_graphs erroneously produces an empty network: data.zip. This means that subsequent functions (e.g., initialising a SampleNetwork) cannot work correctly.

The following code visualises sample_network as created by get_sample_graphs and shows it to be empty. Additionally, this code outputs labels.tif and labels.csv which both have only one entry (the root) when they should contain outputs corresponding to 4 sample sites.

import geochem_inverse_optimize as gio
sample_network, sample_adjacency = gio.get_sample_graphs("data/synthetic_d8.asc", "data/synth_samples.dat")
gio.plot_network(sample_network)
plt.show()

r-barnes commented 1 year ago

Probably has to do with Line 153 of faster-unmixer.cpp reading:

sample_locs[flowdirs.xyToI(sample.x, 862-sample.y)] = sample;

Somewhere we need to document where the sample sites are offset from so that we can standardize around that.

AlexLipp commented 1 year ago

I think assuming the sample sites are offset from the lower left of the DEM is most intuitive.

We also may need to think about units right. It'll be more common for sample site locations to be in raw geographical units (i.e. projected metres) rather than number of raster cells. Converting between the two will just involve dividing through by the grid spacing of the ascii file.

r-barnes commented 1 year ago

If you look at GDAL's documentation you'll see The (GT(0),GT(3)) position is the top left corner of the top left pixel of the raster you can work around this by using negative cell sizes and such, but that gets messy.

Some discussion of top-left vs bottom-left is here: https://computergraphics.stackexchange.com/q/8178 https://softwareengineering.stackexchange.com/q/252386/35362

Note: it would be better not to use ASCII files. Or at least make no assumptions about that as an input format. GeoTIFF allows for lossless compression, &c.

r-barnes commented 1 year ago

(I've decreased the size of data/d8.asc in d4d8e866ff.)

AlexLipp commented 1 year ago

I'm happy to move away from ASCII files to any other standard geospatial raster format.

The actual data-set we have been working to is somewhat of a special case, as they have already been prepared to make it easier to work with (lower-left removed, divided through by step-size). In general, we should assume that the user simply provides x,y coordinates in projected units (e.g., UTM), and a raster projected using same projection.

To me it is easier if we don't expect user to substract the top-left of the raster from their sample-sites (predicting user error). If possible, in faster-unmixer.cpp should calculate the coordinates of the upper-left corner of the raster, and then subtract these from the sample site coordinates to get at the coordinates counting down. These can correctly be used for xyToI. Additionally, we'll need to divide through by the step-size of the DEM to convert from units to indices.

AlexLipp commented 1 year ago

e.g., in this example, coordinates are defined relative to a global origin (grey area), not necessarily the lower-left or upper-right of the raster (blue rectangle). I will get a more general dataset which we can work towards tomorrow, which accords to this expected use case.

r-barnes commented 1 year ago

@AlexLipp : The more general case is to read in the geotransform mentioned in GDAL's docs from the raster and then use that transform to obtain pixel indices for the sample sites, which are hopefully in the same projection as the raster. If the sample sites need adjustment we'd then take an inverse transform to report back the adjusted locations.

AlexLipp commented 1 year ago

@r-barnes, the approach you describe seems most sensible. I think assuming that the xy and raster data are same projection is reasonable.

I've attached a more general format of the data here: data_reformatted.zip. Both the x,y data and the raster are in projected coordinates (metres in this case) relative to an arbitrary origin that is not a corner of the d8 itself. I've also provided the d8 raster in a couple other geospatial raster formats. I have no strong preference for file-format, so happy to defer to your judgement on this.

These sample sites are still 'snapped' to drainage sites which is not likely to be the case in general, so we may still want to provide a 'snap to drainage' function as we discussed.

AlexLipp commented 1 year ago

Closed by PR #33