Closed Jean-Romain closed 5 years ago
I'm craving for it but the FINAL deadline won't allow me to put my hands on it before October. BTW: well done, once again! read you soon
Fair enough :wink: I'm not going to release it before months.
Same as Nic, I will be able to try it for sure in October.
Cheers
Thank to Jakub help the 3 latest commits were dedicated to fix compilation on Windows. It now compiles on windows.
Awesome! I'm almost caught up and I should be able to get back to beta testing starting Friday (I just saw the commit about windows). I've also started working on a rewrite of my code (thanks JR!) and hope to be able to spend quite a bit of time on all if this next week. Bi0m3trics
How can I install lidR v2.0.0?
I'm thinking about a deep internal change in lidR v2
. I don't want anymore to update LAS
objects by reference. It used to make sense in R < 3.1 but now the gain is no longer relevant because R makes shallow copies instead of deep copies.
In short let assume that we have a 1 GB data.frame
that stores the point cloud.
las2 <- lasnormalize(las1)
Used to make a deep copy of las1
into las2
. i.e the las1 + las2
= 2GB . This is why I made functions that work by reference
lasnormalize(las1)
No copy at all. This is memory optimized but also not R user friendly especially for beginners because this kind of behavior is not common in R.
But the question of memory optimization is not relevant anymore since R 3.1. In the previous example las2
is no longer a deep copy of las1
but a shallow copy. It means that every column that were not modified shared the same memory address. There is a single memory allocation for the two objects.
pryr::object_size(las1)
1GB
pryr::object_size(las2)
1GB
pryr::object_size(las1,las2)
1.1GB # and not 2 GB: only the Z column that was updated takes new memory.
Changing this important design choice allows to use lidR
in a more standard way without loss of performance and/or risk of memory overflow (actually not worst than currently).
These is a single case were we can observe a loss of performance.
lasclassify(las1)
las2 <- lasclassify(las1)
In the first case the column Classification
is updated in place without any copy. In the second case they is a copy of the column Classification
. In my opinion this is a minor issue because this design change also implies more simple code internally (especially regarding management of header updates).
I think @floriandeboissieu is the best placed to understand the consequences of such modification. Do you believed it is suitable? Can you see any drawback effect?
In summary, in the currently lidR version
las = readLAS(...) # Memory allocated
lasclassify(las) # 0 extra memory allocation
lasnormalize(las) # 1 column extra memory allocation because the former Z are stored in addition to the new ones
lasclassify(las, sp) # 1 column extra memory allocation
lastrees(las) # 1 column extra memory allocation
las2 = lasfilter(las, Z < 50) # Full copy of the point cloud minus row where Z > 50
Maybe in version 2.0
las = readLAS(...) # Memory allocated
las2 = lasclassify(las) # 1 column extra memory allocation
las3 = lasnormalize(las2) # 1 column extra memory allocation because the former Z are stored in addition to the new ones
las4 = lasclassify(las3, sf) # 1 column extra memory allocation
las5 = lastrees(las4) # 1 column extra memory allocation
las6 = lasfilter(las5, Z < 50) # Full copy of the point cloud minus row where Z > 50
Hi Jean-Romain,
I think it is a good idea. This is a more traditional way from the user point of view, so if you think it simplifies things from the development point of view, great! What I would study carefully is the parallelization of that kind of function (inside or around). If any impact, I suppose it would be better dealt with function using shallow copy. Unfortunately I will not have the time to check about that before begining of october. Tell me if you need help on that question. Cheers Florian
Done!
Parallelization is not done at the algorithm level. This change cannot affect parallelization by any mean. And even if an algorithm was parallelized the addition or affectation of a new attributes is done at the very very end of the function once the computation is already done. It definitively can't affect parallelization.
Now we write traditional code for example:
las <- readLAS(file)
las <- lasnormalize(las)
#or
las_norm <- lasnormalize(las)
Instead of
las <- readLAS(file)
lasnormalize(las)
Notice I can still revert this change. It has some drawbacks. The lasground
case was one of them. But a more important one is for a (future) function lastransform
equivalent to spTransform
. In that case without reference, we have to copy both X and Y without any possibility to recycle existing memory.
las2 = lastransform(las1) # Allocation of 2 new vectors of double
lastransform(las) # 0 Allocation
One thought could be that if no new data added to the object (such as with the transformations or some classifications) the call could be by reference... To me, this is intuitive, it could be a convention that's easily explained in the documentation, and it seems like it would make things easier globally for the developer [😉] . That is unless we desire the same convention and behavior for all calls.. then we're stuck with a compromise no matter which direction is taken.
Kind regards, Andrew
That is unless we desire the same convention and behavior for all calls
Ultimately yes, I'm trying to strongly improve package consistency. Within the package and across other spatial packages.
One thought could be that if no new data added to the object [...] the call could be by reference
This is an option but what about these ones?
las = read(file, select = "xyz")
lasground(las)
# and
las = read(file, select = "*")
lasground(las)
The previous convention should be preferred. The same number of points: by reference, different number of points: by copy (no choice anyway).
My biggest problem with the reference paradigm, without considering consistency, is easily understandable with this user function. As long as side effect happen only within my own function it is ok. But the goal of lidR is to enable user to build their own. And the following one in not straightforward to write for most users (two big traps in 4 lines of code).
lasfilternoise = function(las, sensitivity, res)
{
p95 <- grid_metrics(las, quantile(Z, probs = 0.95), res)
lasclasify(las, p95, "p95")
las2 <- lasfilter(las, Z < p95*sensitivity)
las@data[, p95 := NULL] # remove side-effect
return(las2)
}
This is actually what drove me toward a change in the paradigm even at the cost of memory optimization loss. The worst case I can imagine so far is lastranform
that is not expected to be use often. Indeed ALS data should not be transformed into the CRS of the user's Raster or Spatial data. We should do the opposite.
lasfilternoise = function(las, sensitivity, res)
{
p95 <- grid_metrics(las, quantile(Z, probs = 0.95), res)
las <- lasclassify(las, p95, "p95")
las <- lasfilter(las, Z < p95*sensitivity)
return(las)
}
I changed the name of the function to access LAScatalog
options. They were very annoying especially regarding auto-completion in Rstudio. Also I definitively chose to adopt regular R syntax with shallow copies.
I just pushed a major modification. I removed several algorithms for tree detection and tree segmentation and I moved them into a new package lidRplugins
@jfbourdon all the experimental stuff we are doing together is no longer in lidR
but in lidRplugins
(see also the README)
Hi Jean-Romain, Would there be a possibility to separate extent (i.e. min/max x,y and maybe z) from the effective content of the point cloud? As an example, retiling a series a extracted plots on a larger grid (e.g. 500x500m) won't make a grid but polygons squeezed to optimal size. Thus I loose the grid information. Actually the same happens when you grid any type of point cloud. It would nice if the grid could be kept asis and if there were an option to optimize/squeeze las extent. This is maybe more an rlas issue...
@floriandeboissieu what exactly is your issue ? The extent is based on the content of the header by default. It has been introduced for convenience when I inherited LAS objects from Spatial
objects.
Like Spatial*
and unlike Raster*
you can't modify the bbox with a built-in function. But you can force it programmatically. This does not modify the header only the bounding box i.e. the slot @bbox
.
las@bbox = as.matrix(extent(las) + 10)
Edit: I think I understood your issue. This seems the best option.
No the issue is on 1.6.1 and I was hoping it could be solved in 2.0 that I haven't tried yet... sorry, to busy with the rest at the moment.
The issue is if I change manually las@header@data$Max X
, I write with
writeLAS and reload it, well las@header@data$Max X
has come back to
initial value and not the one I set.
Thus I am loosing the geographical information about the area covered because it is based on the point cloud itself.
I don't remember exaclty where it is happening (lidR, rlas, lastools?).
An example is tiling sparse LAS data will lead to non-joined tiles.
Cheers
On 22/10/2018 17:36, Jean-Romain wrote:
@floriandeboissieu https://github.com/floriandeboissieu what exactly is your issue ? The extent is based on the content of the header by default. It has been introduced for convenience when I inherited LAS objects from |Spatial| objects.
But like |Spatial| and unlike |Raster| you can't modify the bbox with built-in function. But you can force it programmatically
las@bbox = as.matrix(extent(las)+ 10)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Jean-Romain/lidR/issues/177#issuecomment-431871318, or mute the thread https://github.com/notifications/unsubscribe-auth/AP54IUoMvzDTyQMrIEsK3mV8I8nzbtJrks5uneXxgaJpZM4WurUq.
It happens in rlas and it is driven by LASlib. This is not going to change. This part of the las specification:
Max and Min X, Y, Z: The max and min data fields are the actual unscaled extents of the LAS point file data, specified in the coordinate system of the LAS data.
So you can't force to write an extent that does not correspond to the point cloud. It must be based on the content of the point cloud.
Ok, I read the doc myself before asking, but I was hoping it was not in LASlib...
Hi Jean-Romain! Congratulations for your works with the lidR 1.6. Now I'm managed to try the beta version, but I'm not managing to install it. I've tried to install the 'RCSF' but I couldn't. Just the 'rlas' from CRAN. If it is possible, could you provide a code to install the 'RCSF' and the 'lidR (2.0)'?. Thank you very much.
devtools::install_github("Jean-Romain/RCSF")
devtools::install_github("Jean-Romain/lidR", dependencies=TRUE, ref="devel")
devtools::install_github("Jean-Romain/RCSF") devtools::install_github("Jean-Romain/lidR", dependencies=TRUE, ref="devel")
I've tried this, and I ensured that the 'devtools' package was loaded, but this message showed up: ' Downloading GitHub repo Jean-Romain/RCSF@master Error: Could not find tools necessary to compile a package '
You must install a C++ compiler on your machine. On Windows I believed that devtools makes the job for you. Are you a Mac user :apple:? See the also the README https://github.com/Jean-Romain/lidR/tree/devel#install-lidr
There is a very very minor bug in the process bar of:
tree_detection(lidar, vwf(5))
The proces tracker stops at 78% or 79% even though it has completed its task....
@SebastianKasan fixed https://github.com/Jean-Romain/lidR/issues/191. If you find another progress bar like this one please tell me.
There is a potential bug when processing a catalog using more than one core and trying to save results to files with
opt_output_files(ctg) <-"F:\\data\\out\\qc_{XLEFT}_{YBOTTOM}_metrics"
opt_cores(ctg) <- 2
Results in a error:
Error in cluster_write(x, current_processed_cluster@save, output_options) : could not find function "cluster_write"
Everything works fine when the number of cores is set to 1.
Fixed in #193
devel
into master
Release date: between Dec 24th 2018 and Jan 1st 2019
Hey @Jean-Romain and everyone else - I was thinking to contribute a vignette and I was thought it would be cool to truly develop something collaborative... I'd be willing to start a vignette that does what I see as a series of common tasks commonly asked for by new users... and I've love to see others improve on it to make a really nice vignette (i.e., I know as soon as I push code someone will say "you're code is so inefficient!". At present, I've got a few scripts that I could easily combine and add a narratative that:
I know the targeted release date is fast approaching (Merry Christmas to me!) but would an endeavor like this be helpful or feasible? Would people be interested in altering/tweaking/improving my code/narrative to make the a collaborative vignette?
I think this is a great idea. I'd be happy to contribute.
@bi0m3trics I'm currently writing vignettes about "package design" i.e. a ill-defined set of topics that include stuff like "LAS and LAScatalog classes", "how to build function fully compatible with LAS and LAscatalog", "how and why to speed up with lax files". Practical examples such as "extract and normalise ground inventories", "build a predictive model of biomass" are not in my plan.
This kind of vignettes is useful. I'm not writing them because I don't have enough time to write so much documentation. So if you are motivated by writing vignettes do not hesitate. The dead line does not matter. We can add the vignettes later. Some topics of interest imo:
However vignettes are not R tutorials. Vignettes must focus the lidR package exclusively. Vignettes must show relatively simple chunks of code, no complex scripts. More complex tutorials that imply more code and more packages can be written in the wiki.
Also the vignettes are build by the CRAN. If some chunks of code are run by knitr
they must be fast and be based on data provided with the package. In practice it limits what we can write in a vignette.
Edit: I think that starting with wiki pages is a good option. Then we will see what can be put or not in vignettes.
I think this is a great idea. I'd be happy to contribute.
I would also be happy to contribute.
Almost ready to release :
Submission ready but delayed to Jan 2nd 2019
The package submission system will remain offline until Jan 2nd 2019. Reasons being CRAN team vacation and maintenance work.
Happy holidays and thank you for your understanding.
You must install a C++ compiler on your machine. On Windows I believed that devtools makes the job for you. Are you a Mac user 🍎? See the also the README https://github.com/Jean-Romain/lidR/tree/devel#install-lidr
I've tried to reinstall de 'devtools' and the 'Rtools35' but the problem remains after I tried to install the 'rlas' and/or 'lidR' ("Error: Could not find tools necessary to compile a package"). I'm using Windows. Should I install any other C++ compiler ? It seems like I'm jumping some basic important step. Thanks for your patience.
@dncosenza lidR 2.0.0 as been released on CRAN.
I need to generate a DTM by running the 'grid_terrain' using the 'TIN' algorithm, but using the same restriction imposed by the 'max_edge' of 'grid_canopy(dsmtin)'. Is this possible? I tried to execute the 'grid_canopy', but acording to the pdf, it triangulates just the first return so it won't work for DTMs.
No ! But you can open an new issue to suggest a new feature.
Can you maybe delete first returns and then reclassify the last ones as first ones ---- as a workaround
Edited by Jean-Romain: don't do horrible things like that ;-)
To: @bi0m3trics @floriandeboissieu @jfbourdon @spono @ptompalski @AndyPL22 @daveauty @tacormier and all other person who will read this thread.
I merged the development branch of lidR 2.0.0 into the main development branch (
devel
). In short that means that the work onlidR v2.0.0
is almost done. The biggest part of what I wanted to do is done. What remains to do is mostly cosmetic :tada:lidR v2.0.0
comes with much more capabilities especially regardingLAScatalog
processing. It comes with a more consistent set of functions and is more scalable for the future. It also comes with a massive changelog and sadly is deeply incompatible with previous releases (see the changelog to know why)I'm using it myself from weeks to do some stuff that were no possible to do in lidR versions 1.x.y. I now need beta testers to test the new version in a wider range of usage. I need a lot of feedback because I entirely redesigned lidR and that represent several thousands lines of code modified...
So, if you want to test it, thank you for your help in advance.
To install lidR version 2.0.0
Or use a Windows binary.
The release date is expecting to be between Dec 24th 2018 and Jan 1st 2019.