Test output: duplicate, documentation, and suggestion

awickert commented 4 years ago

This should be a simple one. In my test run (#7: AWtest.zip), I labeled my output testOut. It returned:

testOut-diff.tif
testOut-filled.nc
testOut-flooded.nc
testOut-label.tif

These four returned files are pretty intuitive but not defined in the README.

The second and third items are identical, so these should be combined.

I am not sure of the value of returning the diff. It seems like it could be helpful -- or it could be unhelpful, if working with a very large DEM whose extra export could fill HDD space.

testOut-label.tif is obviously the set of leaf-depression labels.

Considering that you have so many outputs that are all named after this main name, I think it may make sense to do the same with the DH connectivity output, thereby changing the command-line way of operating the DH but having a more coherent set of outputs. This set of connections should be significantly smaller than the rasters, and the label.tif output will not be very useful without it.

r-barnes commented 4 years ago

I forget, is the "DH connectivity output" currently being saved out somewhere?

awickert commented 4 years ago

Yes, "DH connectivity output" is, as correctly stated on the README, the third argument (<OutGraph>). And the output looks good, with the ocean being 2**32-1 and everything else having an integer identifier.

r-barnes commented 4 years ago

Hmm, a related question. What outputs should there be? filled, flooded, and diff are really test/verification outputs. It might be that we should move that to a separate executable since they're not outputs that most users want.

To clarify: I see dephier.exe as being more of a demonstration executable than a serious tool that people would use.

awickert commented 4 years ago

That's a good point. And it seems that now that I've updated to your version, no outputs are appearing. Indeed. main.cpp in your version seems to still have no output code (vestige from speed tests?).

Related: Kerry posted an updated bug-fix PR for `main.cpp: https://github.com/r-barnes/Barnes2019-DepressionHierarchy/pull/4

If you see dephier.exe as more of a demo, perhaps the answer is to create some Python bindings that can allow users to more interactively.... um, interact with DH?

speleophysics commented 4 years ago

First off, this looks like a really cool and helpful project. I came across the ESurf papers yesterday and have spent a couple of hours trying to figure out how to use your code. I work mostly in karst and glacial systems, where we are often interested specifically in the surface depressions.

Usage has been a bit slow to figure out. I primarily work in Python (yes! to the Python bindings). While I have worked some in C++, most of that was more than a decade ago. I thought I would just list a few of the key things that were unclear or difficult for me when first approaching your code. Some of these things I have started figuring out by combing through the source code. However, I'd guess that if a user can't figure out how to use your tool without digging into the source, then that will eliminate the majority of the potential users (at least in the Geosciences), particularly if your source is C++.

Questions I had:

What are valid input formats for dems? (I'm guessing it should be ESRI ASCII, based on looking at the test dem files, but it would be nice to state this in the Readme).
What is output? This still isn't clear to me. In particular, is the Depression Hierarchy somehow output? What kind of data structure does it have? How would I read it in and do something with it?
What is ? Is this just the elevation that will be considered as ocean? Integer? Float? It also seems that I can specify a mask, but how do I do this (looks like maybe I just use no_data values)?

If dephier.exe is really just intended to be an example usage of your library rather than a serious tool, you should probably make this more explicit. In this case, a brief summary of the available functions in the library would also go a long way toward improving usability. Python bindings, with accompanying docs, would be even better.

I've also run into some compilation problems, but will create a separate issue to ask about that.

awickert commented 4 years ago

Hi Matt!

GDAL-compatible files: @r-barnes updated this in the README already.
It looks like @r-barnes got to this one too.
~Ocean can be integer or floating point, and it looks like this is addressed as well. I believe that it is treated as a float within the code. I have updated the README beyond what Richard wrote and created a PR (https://github.com/r-barnes/Barnes2019-DepressionHierarchy/pull/12).~

It looks like @r-barnes new updates create a DH that indeed has it more as a standalone usable tool, which is good. @KCallaghan and I are working on a hydrological model that incorporates both DH and FSM, and we plan to include Python bindings as part of this. We'll probably have something together in the next few months; go to https://github.com/umn-earth-surface/TWSM if you want to cheer us on.

~@r-barnes: I think we can close this after you accept my small README.md edits in the PR.~

CORRECTION: I misunderstood the use of "sea level" in the code. Please ignore me on this part.

speleophysics commented 4 years ago

Cool. Works for me now. I'm still a bit unclear on how exactly Ocean works, though. The elevation setting for Ocean makes sense. Does it automatically define edges as ocean too? Or do you also have to manually apply some No Data mask? Just for fun, here's the output from some cockpit karst in Puerto Rico, which ran in 1 sec! PR-ss

Clearly, there are plenty of DEM artifacts at the small scales, but you can certainly see the "cockpits." Currently, I'm primarily interested in the attributes of the depressions, so I'll see if I can figure out how to extract these. However, my group is also heading toward developing a coupled surface-subsurface landscape evolution model (in Landlab) that will explore landscape evolution processes in karst and on glacial surfaces (coupled to conduit development in the subsurface). For this, the routing models you guys are developing might also be useful.

speleophysics commented 4 years ago

I'll continue to comment here on a few things I'm finding confusing. You can feel free to ignore me, but I figure this at least provides a potentially useful record for me or others in the future.

After digging into the Depression class, I figured out how to save other depression stats in the DH csv file (volume, pit/outlet elevations, etc) by adding some additional output statements in main.cpp.

One thing that I found a bit frustrating was that the output label tiff was not georeferenced. This made it difficult to compare depressions directly against the DEM, which I was trying to do in QGIS. Eventually, after digging a bit in RichDEM, I figured out that I could add a line of code that would produce the correct georeferencing:

label.geotransform = topo.geotransform;//inserted in main.cpp label.saveGDAL(out_name+"-label.tif");

(I saw after the fact that this is perhaps what is referred to in #10, for which my code seems to be a quick fix.).

This at least made my life easier, though there might be good reasons to have a raw pixel-based coord output as well. I don't understand enough of how to use the outputs yet to have a feeling for that.

Strange behavior related to Ocean level. Originally, I was seeing little or no change if I set Ocean level to 0 or to somewhere in the middle of the elevation range of my DEM. Eventually, I switched to a DEM that I padded with NaNs on the edge. This seemed to produce somewhat different basins, and then the Ocean level command-line arg also seemed to start having some effect.

I found the code for filtering and for generating top-level labels (watersheds) in dephier_paper_tests.cpp. When I generated top-level labels for my DEM w/o it being padded by NaNs, it was all a single watershed (again, didn't seem to matter what I set Ocean level to). Once I padded with NaNs, I got several watersheds.

Diagonal artifacts. Whatever I do, I seem to end up with a lot of artifacts that align with the diagonal flow directions of the D8, as in the example image above (if I switch it to D4, these are replaced by horizontal and vertical artifacts and much of my domain does not have identified depressions). I've tried filtering by volume or cell number, but whatever I do I seem to still have a lot of small diagonal depressions, even as I start to lose some of the big, obvious depressions I can see in the DEM. This may be somehow related to the DEM I'm using.

KCallaghan commented 4 years ago

Hi Matt, The output you shared looks really cool! I'm glad you are finding our tool useful.

For the ocean level: the code does not automatically define edges as ocean (imagine a case where you have actual ocean on one side of your domain with the other side being inland - you would only want that side to count as ocean). For an entirely inland region like this, I would recommend creating a border yourself that is set to your ocean level. Please also make sure that at least one cell in your input data (or if you are doing the border, all of the border cells) have values exactly equal to the ocean level you select. If you had used some value which did not exist in your DEM, this could have led to some of the strange behaviour you were seeing with differing basins produced.

I have not seen the diagonal artifacts you are getting here on any of the regions I have used this code on, so my first guess would be that it is related to the DEM itself.

r-barnes / Barnes2019-DepressionHierarchy

Test output: duplicate, documentation, and suggestion #8