rphlo / karttapullautin

A fast and accurate map generator from classified LiDAR data.
GNU General Public License v3.0
31 stars 9 forks source link

Feature request / idea #113

Closed stefankinell closed 2 weeks ago

stefankinell commented 3 months ago

Since @rphlo started to translate Karttapullautin from Perl to rust the objective was always to translate the basic features to gain performance, and then look at feature development. But some enhancements has been sneaked in along the way anyhow, which is nice.

I have some feature enhancements that I would like to look at, that border to boosting performance. And instead of thinking of them myself only, I will post them here. If someone else finds a smart way to solve them, or can point me in the right direction that would be awesome. Else I will see if and when I come around to look at the code and understand it better to adress feature enhancements.

The first one is to enable modularity in what the script will output. One part is already achieved with running vegetation only.

Next wish from my side would be to: make it optional if contours03dxf should be produced. make it optional if contours and formlines should be produced (might bundle dotknolls into this as well for simplicity making it a bundle of contours generation or not. make it optional to generate cliffs.

and to make this work in batch mode.

What I want to achieve is that if one wants to produce only clifs for a certain set of laz-files, you should be able to skip the other outputs so it can go faster. Same if I would like to produce only new contours for a certain height - then the program can do that faster by skipping cliffs, vegetation and contours03.

My first real case is that I would like generate 2,5 m contours for some parts of Sweden, and I already have all the other files from earlier output. Then that should go so much faster if vegetation, cliffs and 03-contours are left out.

rphlo commented 3 months ago

I'll first try to come up with a flow chart of karttapullautin execution and dependencies of each steps. That might help with future development.

stefankinell commented 3 months ago

That would help me a lot.

upenn-hughmac commented 3 months ago

Great idea! I generally only need the merged.dxf from batch mode, and it would be a lot cleaner if I could do dxf-only runs.

stefankinell commented 2 months ago

So, I spent some time this morning to look at the structure of the code after the organization into multiple files done by @antbern in release v2.0.3 , #118

Do I understand it correct that the processing is done in process.rs, and that it is orchestrated in pub fn batch_process when running batch mode - but that most work is done in pub fn process_tile?

If so, if I want to start trying to restrict some parts, is it the smartest way to do so I "process_tile"?

What I want to look into is a framework where you can choose what part of the scrip to run based on wanted output. My example was "I want contours only", then do not generate cliffs, vegetation etc.

As an example, I can find where cliffs::makecliffs(thread).unwrap(); is adressed. So I assume I can make If-statements that executes that, or will not execute it, based on parameters.

Are there one or several parts of the script that always must be executed - e.g. to produce the xyz-files that are then reused? Are these functions doing other things at the same time, like generating countours?

I do not want to mess upp the code and logics if this does not make sense, so before I start playing around - give me a heads up @rphlo and @antbern if this sounds like a not so good way of solving the task I try to do. Perhaps it is better to restructure something else first?

antbern commented 2 months ago

Do I understand it correct that the processing is done in process.rs, and that it is orchestrated in pub fn batch_process when running batch mode - but that most work is done in pub fn process_tile?

Yes that is correct, in process_tile goes though all the generation functions in order and is called by either the batch_process or directly from the main function. Also not that there are commands in the main function that calls some of the generation functions individually.

And I would say most of the functions do not depend on a lot of other functions to run, as long as the required input files exist, which is mostly the converted XYZ file. And if the input hasn't changed, we do not need to rerun that part. So afaik there are no parts which must always be rerun.

rphlo commented 1 month ago

@stefankinell I finally got to write the process flow, and dug few issues by doing it...

Karttapullatin Batch Mode Flow

Enter Batch Process

stefankinell commented 1 month ago

Beautiful! That helps a lot! It directly helped me to understand e.g. why my small tests have crashed when I just omitted the whole 03-process.

I think this helps me a lot to start thinking of what can be done and how. I will pick it up again after next weeks work outside the office.

rphlo commented 2 weeks ago

Please checkout v2.2.0

stefankinell commented 2 weeks ago

Nice! Very nice! I think I have to do some testing this week and measure time differences. Thanks a lot!

stefankinell commented 2 weeks ago

So first test - I ran a quite hilly area with 273 laz-files. 10 parallel processes on my Mac M1 Max.

Full rendering - average 15 seconds per laz-file. Only contours - 9,5 seconds per laz-files.

I love this!

And what I noticed is that '_contours.dxf' is populated, also '_contours03.dxf' but not '_formlines.dxf', _dotknolls.dxf' etc

When I get around to rerun larger areas for new contours, or have some spacetime, I will look at how you implemented this and perhaps add parameters to also steer _formlines and dotknolls, and perhaps separate _contours03.dxf from normal contours.

Anyhow - well done, thanks for implementing this!

rphlo commented 2 weeks ago

Some improvements to look in v2.2.1

rphlo commented 2 weeks ago

I now close this as an issue, we can discuss feature request/idea in the discussion instead.