jblindsay / whitebox-tools

An advanced geospatial data analysis platform
https://www.whiteboxgeo.com/
MIT License
967 stars 161 forks source link

Add flow pointer raster as input or output for flow accumulation functions #63

Closed jfbourdon closed 4 years ago

jfbourdon commented 4 years ago

It would be nice to allow functions like d8_flow_accumulation, d_inf_flow_accumulation, rho8_flow_accumulation to either accept as input a flow pointer raster (like the ones generated by d8_pointer, d_inf_pointer, rho8_pointer) or give as output the flow pointer raster already computed internally. I don't want to use flow_accumulation_full_workflow (that produce both a flow pointer and a flow accumulation raster) as I prefer to do my own preprocesing of the DEM.

I need both flow pointer and a flow accumulation raster as I run raster_streams_to_vector and it leads to compute two times the flow pointer raster instead of once.

bkielstr commented 4 years ago

I'd like to second this.

I'm trying to generate a D8 flow accumulation raster using WhiteboxTools (v.1.0.2). For better or worse, I need to provide a tool with a pre-existing D8 flow pointer since the Ontario Integrated Hydrology layers provide an EnhancedFlowDirection raster.

The "D8 and Rho8 Flow Accumulation" in Whitebox GAT 3.4 allows for this input. Based on reading the WhiteboxTools user manual and source code, the flow accumulation tools seem to generate the flow direction internally. Are there plans to port "D8 and Rho8 Flow Accumulation" or is this a move away from the flow direction step altogether?

jblindsay commented 4 years ago

When I developed Whitebox GAT, users complained that they needed to generate the pointer file before running the flow accumulation operations. With WhiteboxTools, I decided to have the flow accumulation tools generate the flow pointers internally, to satisfy these users, and because especially with the ease of parallelization provided by Rust, it is extremely fast to do so. It would seem that either way that I make it work, there will always be users that are going to prefer the alternative approach. Anyhow, I'll see if I can think of a way of autodetecting whether an input raster is a DEM or a flow pointer and go from there.

bkielstr commented 4 years ago

Thanks for the information. I agree that you can't please all users. In all other cases, I would definitely use the WhiteboxTools approach. It really did take me all of 5 minutes to export and run the Whitebox GAT tool. As always, thanks for putting such great effort into building such good software.

jfbourdon commented 4 years ago

What I had in mind was not necessarily autodetecting the type of input file, but instead having an optional input like --d8_pntr. If present, the script don't check --dem and bypass the computation of the flow pointer. I could be a switch also.

As I explained initially, my issue is mainly because I run raster_streams_to_vector that needs both a flow pointer raster and a flow accumulation raster and that currently with Whitebox some efficiency is lost. Not a big deal in this case, just a little anoyance in such a high performance tool :)

jblindsay commented 4 years ago

That is an approach that could certainly work. I'll give it a try and see if I can get this change into v1.2.0, to be release later this week. Thanks for the suggestion.

jblindsay commented 4 years ago

I just deposited fixes to the D8 and D-infinity flow accumulation tools. I have changed the name of the input raster parameter from --dem to --input (although --dem is still accepted internally as a hidden feature). By default, the tool assumes this input is a DEM but if the --pntr flag is specified, it understands the raster to be a flow pointer, derived using either the D8Pointer or DinfPointer tool depending on the flow accumulation tool used. I believe this change satisfies the needs of this feature request and so I am now closing the issue. I am also planning on releasing v1.2.0 very shortly and would appreciate your feedback on this feature when you get a chance.

jfbourdon commented 4 years ago

Something is probably wrong because running D8FlowAccumulation with the --pntr flag is a bit slower than with only a DEM as input. If relevant, the verbose still shows Flow directions: xx% going up. Tested with two 150km² DEM at 1m resolution.

jblindsay commented 4 years ago

So I don't understand why you would think think that the tool should be faster with the D8 Pointer input instead of the DEM. I didn't make this change for performance. Providing the pointer should take about the same time as the DEM.

jfbourdon commented 4 years ago

My tought was that if a D8 Pointer is given to D8FlowAccumulation, the tool won't have to generate it internally and thus would save time. Overall, I fully understand that there should be no time saved if I use D8Pointer and then D8FlowAccumulation with --pntr compared to just D8FlowAccumulation with a DEM as input (in fact a bit slower due to I/O). The thing is that the former workflow is slower than the later when excluding I/O.

Tool Time excluding I/O
D8Pointer 6.23s
D8FlowAccumulation with DEM as input 32.93s
D8FlowAccumulation with D8Pointer as input 38.78s

*Times are averaged over three runs on a 380 millions valid cells DEM.

I think the issue is that when a D8Pointer is given as input, D8FlowAccumulation doesn't consider its NoData value and thus compute a flow accumulation on all cells (in my case example, 922 millions cells instead of only 380 millions).

jblindsay commented 4 years ago

Yes, but it doesn't work like that. As I said earlier, generating a pointer from an input DEM is basically as efficient as one could hope for because it is readily parallelized. It is so efficient that you can basically think of it as free from the perspective of the flow accumulation tool. That's why I implemented it that way. Reading in a D8 pointer file does not negate the need for work to be done. The D8 pointer that is output by the D8 pointer tool is in a base-2 format (1, 2, 4, 8, 16...), which when read in by the flow accumulation tool, needs to be converted into the 0,1,2,3,4... format that is needed in the flow accumulation step (which it is automatically generated as when the pointer is calculated as when a DEM is input). Also, I need to have special code in place depending on whether the pointer is in Whitebox format or ESRI format. Believe me when I say that I think a lot about the efficiency of every tool in WhiteboxTools. If I have implemented a tool in a certain way, there is very likely a good reason why I have done so. If you tell me that you want to have the flow accumulation tool input a pointer file instead of a DEM so that you have the flexibility to manipulate the pointer in a custom way, well that's something I can understand. But if you tell me that you want this feature because you're hoping it'll speed up the workflow, well that doesn't make any sense to me. It simply doesn't work like that on the implementation side of things.

jfbourdon commented 4 years ago

The conversion from (1, 2, 4, 8, 16...) to (0,1,2,3,4...) was a subtility (at least for me) in the implementation that I was missing. Be certain, I trust you when you say that you think a lot about the efficiency of every tool in WhiteboxTools. It's just normal that you just can't envision all possible use case. In any case, you do an amazing job at implementing algorithm in way that allow some workflow that was hardly imaginable before (that's the case for us with hydrological modeling). I also badly need to experiment with Rust myself to better understand your code and to post issues more on point.

But with all that being said, the NoData info is definitively lost somewhere in D8FlowAccumulation with D8Pointer as input as all cells that were NoData in the DEM now have 1 unit of flow accumulation.

jblindsay commented 4 years ago

@jfbourdon I'm sorry, I must have missed the NoData issue earlier. Would you mind opening a fresh issue for this and I'll see if I can fix this as soon as I get a free chance. I agree with you that I cannot envision all of the possible use cases that WBT users have and that there may be more changes that may make some use cases more efficient or suitable. I like it when users make such suggestions and there are cases, as you are clearly aware, of when this has led to wonderful improvements in the software. The one caveat is that often users make suggestions that are narrowly suited to their particular use-case (e.g. can you make this change to make the library integrate better with Python?). I always need to balance such requests with the wider user community (e.g. will this change that is useful for Python users make it more difficult for R users, or will this change to better integrate with the QGIS frontend make maintaining the ArcGIS frontend more difficult). Maintaining a project of this scale can be extremely challenging. I hope that I make the right decisions, although I know that I cannot always do so and that not everyone will be satisfied with the decisions that I make. It's a compromise. Anyhow, all of this said, I do appreciate all of your constructive feedback. None of it gets missed and if I haven't yet acted on something, it's because I need time to think about it more. Regards, John.