r-barnes / faster-unmixer

A faster implementation of the sediment inverse/unmixing scheme proposed in Lipp et al (2021).
5 stars 1 forks source link

Don't hardcode "labels.tif" #19

Open r-barnes opened 1 year ago

AlexLipp commented 1 year ago

One approach could be to translate the C++ code into a python implemenation. This would marginally increase runtime as the network is built, but as this only needs to be done once per sample network, and is linear w.r.t number of DEM nodes, it would not be a major handicap. It could even still use richDEM implementations for flow-routing, sinkfilling etc.

This would have the benefits of 1) all the code being in one language 2) nothing requiring compilation and 3) not hardcoding labels.tif and other files. In general it might make the repository more accessible.

r-barnes commented 1 year ago

@AlexLipp : I'm slightly nervous about translating the C++ to Python because my recollection from CSDMS is that their implementation of Priority-Flood was about 20,000x slower in Python versus C++. At the moment this code should scale pretty well even to much larger regions (various caveats about the optimization problem); I wouldn't want to lose that unnecessarily.

That said, do you foresee the possibility of people using this iteratively or in simulations? If so, keeping everything fast will be appreciated. If not, folks can probably tolerate longer processing times.

AlexLipp commented 1 year ago

@r-barnes I think it unlikely this would be used iteratively in situations where the drainage network is changing. Concentration observations are going to be made over timescales when the drainage is fixed. e.g., either the observations are a snap-shot from ~simultaneous sampling, or time-series over a period of weeks-years. In neither of those cases would it be needed to recalculate drainage.

Personally I think its a reasonable assumption (also see Issue #23) that the DEM is already hydrologically conditioned (sinks filled, drainage routing as expected) prior to this analysis. I think this is reasonable as 1) hydrologically conditioned DEM products are now widely available, and often have been manually validated e.g., HydroSHEDS and 2) sink-filling algorithms are widely available in various 'point-and-click' and other packages (e.g., QGIS, arc, pysheds, landlab, richDEM, t), which people are familiar with.

If we do have one load_topo function, this could optionally have a fill_sinks argument, defaulting to False, but available if needed, with the acknowledgement that faster implementations are available (e.g., richDEM).

From experience, people are willing to let ArcMap and QGIS take days to do certain operations... as long as its a one off processing step (whether this is reasonable is another issue). So I think we don't need to worry too much about this.

AlexLipp commented 1 year ago

Could this not use your richDEM python implementation of sinkfilling (or at least reference it), bypassing the need for the CSDMS implementation? I find it to be significantly faster than the LandLab one.

AlexLipp commented 1 year ago

I think I had forgotten how slow some drainage calculations can be in python. Having just tried to process a moderately large DEM in LandLab, I'm less concerned that the drainage calculations are not all performed in python, so perhaps lets keep the efficient/fast implementations for now.