r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
607 stars 132 forks source link

lidR v2.0.0 beta test #177

Closed Jean-Romain closed 5 years ago

Jean-Romain commented 6 years ago

To: @bi0m3trics @floriandeboissieu @jfbourdon @spono @ptompalski @AndyPL22 @daveauty @tacormier and all other person who will read this thread.

I merged the development branch of lidR 2.0.0 into the main development branch (devel). In short that means that the work on lidR v2.0.0 is almost done. The biggest part of what I wanted to do is done. What remains to do is mostly cosmetic :tada:

lidR v2.0.0 comes with much more capabilities especially regarding LAScatalog processing. It comes with a more consistent set of functions and is more scalable for the future. It also comes with a massive changelog and sadly is deeply incompatible with previous releases (see the changelog to know why)

I'm using it myself from weeks to do some stuff that were no possible to do in lidR versions 1.x.y. I now need beta testers to test the new version in a wider range of usage. I need a lot of feedback because I entirely redesigned lidR and that represent several thousands lines of code modified...

So, if you want to test it, thank you for your help in advance.

To install lidR version 2.0.0

devtools::install_github("Jean-Romain/lidR", dependencies=TRUE, ref="devel")

Or use a Windows binary.

The release date is expecting to be between Dec 24th 2018 and Jan 1st 2019.

spono commented 6 years ago

I'm craving for it but the FINAL deadline won't allow me to put my hands on it before October. BTW: well done, once again! read you soon

Jean-Romain commented 6 years ago

Fair enough :wink: I'm not going to release it before months.

floriandeboissieu commented 6 years ago

Same as Nic, I will be able to try it for sure in October.

Cheers

Jean-Romain commented 6 years ago

Thank to Jakub help the 3 latest commits were dedicated to fix compilation on Windows. It now compiles on windows.

bi0m3trics commented 6 years ago

Awesome! I'm almost caught up and I should be able to get back to beta testing starting Friday (I just saw the commit about windows). I've also started working on a rewrite of my code (thanks JR!) and hope to be able to spend quite a bit of time on all if this next week. Bi0m3trics

ManuelSpinola commented 6 years ago

How can I install lidR v2.0.0?

Jean-Romain commented 6 years ago

I'm thinking about a deep internal change in lidR v2. I don't want anymore to update LAS objects by reference. It used to make sense in R < 3.1 but now the gain is no longer relevant because R makes shallow copies instead of deep copies.

In short let assume that we have a 1 GB data.frame that stores the point cloud.

las2 <- lasnormalize(las1)

Used to make a deep copy of las1 into las2. i.e the las1 + las2 = 2GB . This is why I made functions that work by reference

lasnormalize(las1)

No copy at all. This is memory optimized but also not R user friendly especially for beginners because this kind of behavior is not common in R.

But the question of memory optimization is not relevant anymore since R 3.1. In the previous example las2 is no longer a deep copy of las1 but a shallow copy. It means that every column that were not modified shared the same memory address. There is a single memory allocation for the two objects.

pryr::object_size(las1)
1GB

pryr::object_size(las2)
1GB

pryr::object_size(las1,las2)
1.1GB # and not 2 GB: only the Z column that was updated takes new memory.

Changing this important design choice allows to use lidR in a more standard way without loss of performance and/or risk of memory overflow (actually not worst than currently).

These is a single case were we can observe a loss of performance.

lasclassify(las1)
las2 <- lasclassify(las1)

In the first case the column Classification is updated in place without any copy. In the second case they is a copy of the column Classification. In my opinion this is a minor issue because this design change also implies more simple code internally (especially regarding management of header updates).

I think @floriandeboissieu is the best placed to understand the consequences of such modification. Do you believed it is suitable? Can you see any drawback effect?

In summary, in the currently lidR version

las = readLAS(...)       # Memory allocated
lasclassify(las)         # 0 extra memory allocation
lasnormalize(las)        # 1 column extra memory allocation because the former Z are stored in addition to the new ones
lasclassify(las, sp)     # 1 column extra memory allocation
lastrees(las)            # 1 column extra memory allocation
las2 = lasfilter(las, Z < 50) # Full copy of the point cloud minus row where Z > 50

Maybe in version 2.0

las = readLAS(...)            # Memory allocated
las2 = lasclassify(las)       # 1 column extra memory allocation
las3 = lasnormalize(las2)     # 1 column extra memory allocation because the former Z are stored in addition to the new ones
las4 = lasclassify(las3, sf)  # 1 column extra memory allocation
las5 = lastrees(las4)         # 1 column extra memory allocation
las6 = lasfilter(las5, Z < 50) # Full copy of the point cloud minus row where Z > 50
floriandeboissieu commented 6 years ago

Hi Jean-Romain,

I think it is a good idea. This is a more traditional way from the user point of view, so if you think it simplifies things from the development point of view, great! What I would study carefully is the parallelization of that kind of function (inside or around). If any impact, I suppose it would be better dealt with function using shallow copy. Unfortunately I will not have the time to check about that before begining of october. Tell me if you need help on that question. Cheers Florian

Jean-Romain commented 6 years ago

Done!

Parallelization is not done at the algorithm level. This change cannot affect parallelization by any mean. And even if an algorithm was parallelized the addition or affectation of a new attributes is done at the very very end of the function once the computation is already done. It definitively can't affect parallelization.

Now we write traditional code for example:

las <- readLAS(file)
las <- lasnormalize(las)

#or

las_norm <- lasnormalize(las)

Instead of

las <- readLAS(file)
lasnormalize(las)
Jean-Romain commented 6 years ago

Notice I can still revert this change. It has some drawbacks. The lasground case was one of them. But a more important one is for a (future) function lastransform equivalent to spTransform. In that case without reference, we have to copy both X and Y without any possibility to recycle existing memory.

las2 = lastransform(las1) # Allocation of 2 new vectors of double
lastransform(las)         # 0 Allocation 
bi0m3trics commented 6 years ago

One thought could be that if no new data added to the object (such as with the transformations or some classifications) the call could be by reference... To me, this is intuitive, it could be a convention that's easily explained in the documentation, and it seems like it would make things easier globally for the developer [😉] . That is unless we desire the same convention and behavior for all calls.. then we're stuck with a compromise no matter which direction is taken.

Kind regards, Andrew

Jean-Romain commented 6 years ago

That is unless we desire the same convention and behavior for all calls

Ultimately yes, I'm trying to strongly improve package consistency. Within the package and across other spatial packages.

One thought could be that if no new data added to the object [...] the call could be by reference

This is an option but what about these ones?

las = read(file, select = "xyz")
lasground(las)

# and

las = read(file, select = "*")
lasground(las)

The previous convention should be preferred. The same number of points: by reference, different number of points: by copy (no choice anyway).

My biggest problem with the reference paradigm, without considering consistency, is easily understandable with this user function. As long as side effect happen only within my own function it is ok. But the goal of lidR is to enable user to build their own. And the following one in not straightforward to write for most users (two big traps in 4 lines of code).

lasfilternoise = function(las, sensitivity, res)
{
  p95 <- grid_metrics(las, quantile(Z, probs = 0.95), res)
  lasclasify(las, p95, "p95")
  las2 <- lasfilter(las, Z < p95*sensitivity)
  las@data[, p95 := NULL] # remove side-effect
  return(las2)
}

This is actually what drove me toward a change in the paradigm even at the cost of memory optimization loss. The worst case I can imagine so far is lastranform that is not expected to be use often. Indeed ALS data should not be transformed into the CRS of the user's Raster or Spatial data. We should do the opposite.

lasfilternoise = function(las, sensitivity, res)
{
  p95 <- grid_metrics(las, quantile(Z, probs = 0.95), res)
  las <- lasclassify(las, p95, "p95")
  las <- lasfilter(las, Z < p95*sensitivity)
  return(las)
}
Jean-Romain commented 6 years ago

I changed the name of the function to access LAScatalog options. They were very annoying especially regarding auto-completion in Rstudio. Also I definitively chose to adopt regular R syntax with shallow copies.

Jean-Romain commented 6 years ago

I just pushed a major modification. I removed several algorithms for tree detection and tree segmentation and I moved them into a new package lidRplugins

@jfbourdon all the experimental stuff we are doing together is no longer in lidR but in lidRplugins (see also the README)

floriandeboissieu commented 6 years ago

Hi Jean-Romain, Would there be a possibility to separate extent (i.e. min/max x,y and maybe z) from the effective content of the point cloud? As an example, retiling a series a extracted plots on a larger grid (e.g. 500x500m) won't make a grid but polygons squeezed to optimal size. Thus I loose the grid information. Actually the same happens when you grid any type of point cloud. It would nice if the grid could be kept asis and if there were an option to optimize/squeeze las extent. This is maybe more an rlas issue...

Jean-Romain commented 6 years ago

@floriandeboissieu what exactly is your issue ? The extent is based on the content of the header by default. It has been introduced for convenience when I inherited LAS objects from Spatial objects.

Like Spatial* and unlike Raster* you can't modify the bbox with a built-in function. But you can force it programmatically. This does not modify the header only the bounding box i.e. the slot @bbox.

las@bbox = as.matrix(extent(las) + 10)

Edit: I think I understood your issue. This seems the best option.

floriandeboissieu commented 6 years ago

No the issue is on 1.6.1 and I was hoping it could be solved in 2.0 that I haven't tried yet... sorry, to busy with the rest at the moment.

The issue is if I change manually las@header@data$Max X , I write with writeLAS and reload it, well las@header@data$Max X has come back to initial value and not the one I set.

Thus I am loosing the geographical information about the area covered because it is based on the point cloud itself.

I don't remember exaclty where it is happening (lidR, rlas, lastools?).

An example is tiling sparse LAS data will lead to non-joined tiles.

Cheers

On 22/10/2018 17:36, Jean-Romain wrote:

@floriandeboissieu https://github.com/floriandeboissieu what exactly is your issue ? The extent is based on the content of the header by default. It has been introduced for convenience when I inherited LAS objects from |Spatial| objects.

But like |Spatial| and unlike |Raster| you can't modify the bbox with built-in function. But you can force it programmatically

las@bbox = as.matrix(extent(las)+ 10)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/lidR/issues/177#issuecomment-431871318, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IUoMvzDTyQMrIEsK3mV8I8nzbtJrks5uneXxgaJpZM4WurUq.

Jean-Romain commented 6 years ago

It happens in rlas and it is driven by LASlib. This is not going to change. This part of the las specification:

Max and Min X, Y, Z: The max and min data fields are the actual unscaled extents of the LAS point file data, specified in the coordinate system of the LAS data.

So you can't force to write an extent that does not correspond to the point cloud. It must be based on the content of the point cloud.

floriandeboissieu commented 6 years ago

Ok, I read the doc myself before asking, but I was hoping it was not in LASlib...

dncosenza commented 6 years ago

Hi Jean-Romain! Congratulations for your works with the lidR 1.6. Now I'm managed to try the beta version, but I'm not managing to install it. I've tried to install the 'RCSF' but I couldn't. Just the 'rlas' from CRAN. If it is possible, could you provide a code to install the 'RCSF' and the 'lidR (2.0)'?. Thank you very much.

Jean-Romain commented 6 years ago
devtools::install_github("Jean-Romain/RCSF")
devtools::install_github("Jean-Romain/lidR", dependencies=TRUE, ref="devel")
dncosenza commented 6 years ago
devtools::install_github("Jean-Romain/RCSF")
devtools::install_github("Jean-Romain/lidR", dependencies=TRUE, ref="devel")

I've tried this, and I ensured that the 'devtools' package was loaded, but this message showed up: ' Downloading GitHub repo Jean-Romain/RCSF@master Error: Could not find tools necessary to compile a package '

Jean-Romain commented 6 years ago

You must install a C++ compiler on your machine. On Windows I believed that devtools makes the job for you. Are you a Mac user :apple:? See the also the README https://github.com/Jean-Romain/lidR/tree/devel#install-lidr

SebastianKasan commented 5 years ago

There is a very very minor bug in the process bar of:

tree_detection(lidar, vwf(5))

The proces tracker stops at 78% or 79% even though it has completed its task....

Jean-Romain commented 5 years ago

@SebastianKasan fixed https://github.com/Jean-Romain/lidR/issues/191. If you find another progress bar like this one please tell me.

ptompalski commented 5 years ago

There is a potential bug when processing a catalog using more than one core and trying to save results to files with

opt_output_files(ctg) <-"F:\\data\\out\\qc_{XLEFT}_{YBOTTOM}_metrics" opt_cores(ctg) <- 2

Results in a error: Error in cluster_write(x, current_processed_cluster@save, output_options) : could not find function "cluster_write"

Everything works fine when the number of cores is set to 1.

Jean-Romain commented 5 years ago

Fixed in #193

Jean-Romain commented 5 years ago

Release date: between Dec 24th 2018 and Jan 1st 2019

bi0m3trics commented 5 years ago

Hey @Jean-Romain and everyone else - I was thinking to contribute a vignette and I was thought it would be cool to truly develop something collaborative... I'd be willing to start a vignette that does what I see as a series of common tasks commonly asked for by new users... and I've love to see others improve on it to make a really nice vignette (i.e., I know as soon as I push code someone will say "you're code is so inefficient!". At present, I've got a few scripts that I could easily combine and add a narratative that:

  1. performs some simple QAQC queries/analysis to get people looking at their data and checking for outliers, flight angle issues, etc.
  2. build a catalog and produce what we refer to here (western US) as first order products (dem, chm, mean int, etc) in parallel and write those out to a rasterstack
  3. extract inventory data that's been processed to produce per unit area ba, biomass, volume, etc., process sample plot lasmetrics, fit a simple RF model to predict said ba, biomass, volume, etc., and then predict 20m resolution rasters and write them out as a rasterstack.

I know the targeted release date is fast approaching (Merry Christmas to me!) but would an endeavor like this be helpful or feasible? Would people be interested in altering/tweaking/improving my code/narrative to make the a collaborative vignette?

daveauty commented 5 years ago

I think this is a great idea. I'd be happy to contribute.

Jean-Romain commented 5 years ago

@bi0m3trics I'm currently writing vignettes about "package design" i.e. a ill-defined set of topics that include stuff like "LAS and LAScatalog classes", "how to build function fully compatible with LAS and LAscatalog", "how and why to speed up with lax files". Practical examples such as "extract and normalise ground inventories", "build a predictive model of biomass" are not in my plan.

This kind of vignettes is useful. I'm not writing them because I don't have enough time to write so much documentation. So if you are motivated by writing vignettes do not hesitate. The dead line does not matter. We can add the vignettes later. Some topics of interest imo:

However vignettes are not R tutorials. Vignettes must focus the lidR package exclusively. Vignettes must show relatively simple chunks of code, no complex scripts. More complex tutorials that imply more code and more packages can be written in the wiki.

Also the vignettes are build by the CRAN. If some chunks of code are run by knitr they must be fast and be based on data provided with the package. In practice it limits what we can write in a vignette.

Edit: I think that starting with wiki pages is a good option. Then we will see what can be put or not in vignettes.

samherniman commented 5 years ago

I think this is a great idea. I'd be happy to contribute.

I would also be happy to contribute.

Jean-Romain commented 5 years ago

Almost ready to release :

Jean-Romain commented 5 years ago

Submission ready but delayed to Jan 2nd 2019

The package submission system will remain offline until Jan 2nd 2019. Reasons being CRAN team vacation and maintenance work.

Happy holidays and thank you for your understanding.

dncosenza commented 5 years ago

You must install a C++ compiler on your machine. On Windows I believed that devtools makes the job for you. Are you a Mac user 🍎? See the also the README https://github.com/Jean-Romain/lidR/tree/devel#install-lidr

I've tried to reinstall de 'devtools' and the 'Rtools35' but the problem remains after I tried to install the 'rlas' and/or 'lidR' ("Error: Could not find tools necessary to compile a package"). I'm using Windows. Should I install any other C++ compiler ? It seems like I'm jumping some basic important step. Thanks for your patience.

Jean-Romain commented 5 years ago

@dncosenza lidR 2.0.0 as been released on CRAN.

dncosenza commented 5 years ago

I need to generate a DTM by running the 'grid_terrain' using the 'TIN' algorithm, but using the same restriction imposed by the 'max_edge' of 'grid_canopy(dsmtin)'. Is this possible? I tried to execute the 'grid_canopy', but acording to the pdf, it triangulates just the first return so it won't work for DTMs.

Jean-Romain commented 5 years ago

No ! But you can open an new issue to suggest a new feature.

SebastianKasan commented 5 years ago

Can you maybe delete first returns and then reclassify the last ones as first ones ---- as a workaround

Edited by Jean-Romain: don't do horrible things like that ;-)