Closed aeiche01 closed 2 months ago
The problem you are exposing is true for any other software on the market and is not specific to lidR.
Your idea is to create a kind of hierarchy with a meta-catalog associated to files you don't have on disk and download on the fly the files you need. Then delete those that are no longer needed. In this case why do you need a meta-catalog? We could do the same with a regular catalog without adding a layer of complexity.
Also lidR has never been designed to process state wide dataset. It was designed as a R&D toolbox. It does have limitations when it comes to process very large dataset. You should have a look to lasR. It does not resolve the problem you are exposing here but at least will better to handle large datasets.
Last but not least, my software lidR and lasR are no longer supported by my university. While the software will remain free and open source, I am now self-employed to sustain its development. I am offering my services independently for training courses, consulting, and development. If you are capable of sponsoring this feature this is something that I could develop for your custom needs. For more information, please visit my website: https://www.r-lidar.com/.
The LAS catalog approach is awesome, but seems to work only if I can store all the lidar data on my computer/cloud storage/etc. I'm running into an issue, however, where I want to analyze a large area (e.g., an entire U.S. state) and don't have the space to store all the laz files I'll need. I came up with a possible workaround, but I'm not sure if it's a good idea. If it is a good idea, though, it might be something to include as an option for use in lidR.
Basically, the idea is that we split the large ROI into smaller overlapping segments, like LAS catalog does with the las/laz files already. Then we use some API to download lidar files to disk for one of the smaller segments, analyze that as an LAS catalog, then delete the lidar files that aren't in the next smaller ROI (since the smaller ROIs overlap, there will be some lidar tiles that appear across multiple edge-sharing ROIs). Then we do that again, and again, and again, until we've covered the space of the entire large ROI. At which point, we can merge the outputs into what we need.
Is this a good idea, or am I making some major mistake or bad assumption? I wrote up some code to demonstrate how this could work. Haven't run it yet so there are probably bugs, but it covers what I was thinking. It uses dsmSearch, which is a package that grabs lidar data from the US National Map: