Incoming improvements in lidR useful for TreeLS

Jean-Romain commented 4 years ago

This issue gonna be long. I'm presenting new C++ feature in lidR that will strongly benefit to TreeLS both in term of memory and speed. I'll start explaining some memory issues in TreeLS, then explain you how to improve that by linking to lidR.

Memory and speed issues

I'm using fastPointMetrics with knn = 10 as an example but for what I've seen it may apply to other functions. fastPointMetrics is not as fast as it could be mainly because inefficient memory management at different level. TLS data are huge, thus making a copy of point clouds must be performed carefully. In your case I can see many copies in the code. By copy I mean a block of memory the size of the point cloud

k = nabor::knn(las %>% las2xyz, k = k+1, radius = r)

Here we have many copies: las %>% las2xyz creates two copies. nabor::knn creates 6 copies for k = 10. It allocates k x n matrix of int + a k x m matrix of double which mean approx 5 times the memory of the xyz coordinates + the spatial index.

ptm = pointMetricsCpp(las %>% las2xyz, kid, pick_metrics$log) %>% do.call(what = rbind) %>% as.data.table

Here at least 2 more copies

vector<vector<double> > cloud = rmatrix2cpp(las);

Here one again.

I may have missed some of the copies. I counted at least 12 copies. Some are temporary, some are persistent for the whole duration of fastPointMetrics. Anyway a copy has a cost in computation time and memory usage and I don't thing your code can handle big TLS datasets and is slower than it could be.

How lidR manages such case

lidR has its own spatial index. This allows to avoid copies using pkg::knn at R level working only at c++ level efficiently with a single copy avoiding any type casting such as data.frame to matrix. Sadly this spatial index was optimized for ALS data and thus was slow on TLS data. The computational cost of fastPointMetric being the eigen decomposition we can compare with segment_shape that is almost equivalent in term of computation. Your function is faster.

file = system.file("extdata", "pine.laz", package="TreeLS")
las = readTLS(file, select='xyz')

all_metrics = fastPointMetrics.available()
my_metrics = all_metrics[c(1,4,6)]

t1 = Sys.time()
tls = fastPointMetrics(las, ptm.knn(10), my_metrics)
t2 = Sys.time()
t2-t1
#> Time difference of 1.825634 secs

t1 = Sys.time()
las <- segment_shapes(las, shp_plane(k = 15), "Coplanar")
t2 = Sys.time()
t2-t1
#> Time difference of 2.67153 secs

New in lidR 3.1.0 - high-level API

In lidR 3.1.0 the spatial index can be adapted and extended to TLS data. Below the high level API. LAS objects can now be tagged with an acquisition source. We can see the huge seed up. It is faster than fastPointMetrics and uses much fewer memory.

file = system.file("extdata", "pine.laz", package="TreeLS")
las = readTLS(file, select='xyz')     # Regular read with readLAS no tag
tls = readTLSLAS(file, select='xyz')  # New read that adds a TLS tag

t1 = Sys.time()
las <- segment_shapes(las, shp_plane(k = 15), "Coplanar")
t2 = Sys.time()
t2-t1
#> Time difference of 5.050502 secs

t1 = Sys.time()
las <- segment_shapes(tls, shp_plane(k = 15), "Coplanar")
t2 = Sys.time()
t2-t1
#> Time difference of 1.450502 secs

This applies to the generic point_metrics function. With point_metrics you can fully reproduce your fastPointMetric only at the R level being memory optimized (a single copy) and approximately as fast

Rcpp::sourceCpp(code = "
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
SEXP eigen_values(arma::mat A) {
arma::mat coeff;
arma::mat score;
arma::vec latent;
arma::princomp(coeff, score, latent, A);
return(Rcpp::wrap(latent));
}")

eigen = function(x,y,z, th1 = 25, th2 = 6) {
  xyz <- cbind(x,y,z)
  l <- eigen_values(xyz)
  return(list(l1 = l[1],l2 = l[2],l3 = l[3]))
}

t1 = Sys.time()
m = point_metrics(las, eigen(X,Y,Z), k = 10)
t2 = Sys.time()
t2-t1
#> Time difference of 6.099894 secs

t1 = Sys.time()
m = point_metrics(tls, eigen(X,Y,Z), k = 10)
t2 = Sys.time()
t2-t1
#> Time difference of 2.181839 secs

New in lidR 3.1.0 - low-level API

But you can do even better. The spatial index is now available and you can LinkingTo lidR. point_metrics is optimized but make R call internally. This kills the performance. We can use the spatial index in lidR to do that fully at C++ level.

// [[Rcpp::depends(lidR)]]
// [[Rcpp::depends(RcppArmadillo)]]

#include <RcppArmadillo.h>
#include <GridPartition.h>
using namespace Rcpp;

// [[Rcpp::export]]
void fasterPointMetric(S4 las, int k) 
{
  // No copy at all here
  DataFrame data = as<DataFrame>(las.slot("data"));
  NumericVector X = data["X"];
  NumericVector Y = data["Y"];
  NumericVector Z = data["Z"];
  int npoints = X.size();

  // One copy for the index. The index automatically detects
  // the TLS tag in the LAS object
  lidR::GridPartition tree(las);

  // Memory allocation
  arma::mat A(k,3);
  arma::mat coeff;
  arma::mat score;
  arma::vec latent;

  for (auto i = 0 ; i < npoints ; i++)
  {
    // We search for the knn of point i
    lidR::PointXYZ p(X[i], Y[i], Z[i]);

    // Make the query
    std::vector<lidR::PointXYZ*> pts;
    tree.knn(p, k, pts);

    // eigen decomposition is the real cost of the function
    for (unsigned int j = 0 ; j < pts.size() ; j++)
    {
      A(j,0) = pts[j]->x;
      A(j,1) = pts[j]->y;
      A(j,2) = pts[j]->z;
    }
    arma::princomp(coeff, score, latent, A);

    // compute some extra metrics
    double linearity = (latent[0] - latent[1]) / latent[0];
    double planarity = (latent[1] - latent[2]) / latent[0];
    double dmean = 0;
    for (auto j = 0 ; j < pts.size() ; j++)   {
      double d2 = (p.x - pts[j]->x)*(p.x - pts[j]->x) + (p.y - pts[j]->y)*(p.y - pts[j]->y) + (p.z - pts[j]->z)*(p.z - pts[j]->z);
      dmean =+ 1/pts.size()*std::sqrt(d2);
    }

    // Here fill an pre-allocated R object to return
  }

  // Toy example we return nothing
  return;
}

Let try it. The code is simpler, faster and memory optimized.

t1 = Sys.time()
m = fasterPointMetric(las, k = 10)
t2 = Sys.time()
t2-t1
#> Time difference of 2.941839 secs

t1 = Sys.time()
m = fasterPointMetric(tls, k = 10)
t2 = Sys.time()
t2-t1
#> Time difference of 1.151839 secs

You can also make query in arbitrary shape with lookup that is templated

Sphere s(xcenter, ycenter, zcenter, radius);
tree.lookup(s, pts);

Conclusion

All of this is experimental. The API is subject to modifications until release but you can yet considerably improve TreeLS and make it at least as fast (probably faster) but less memory greedy and you can simplify drastically your code. If you have any question do not hesitate. I'd like refine the code with your feed back. It passed the 1000+ unit tests of lidR so I guess it is safe but we may still encounter some trouble so your feed back is welcome.

Cheers.

tiagodc commented 4 years ago

Wow... thanks a lot @Jean-Romain !! That's a great input, but I'm a bit embarassed you actually read my C++ code :sweat_smile: ... it is far from optimal and not very DRY.

Most of it was transcribed from pure C++ code I used to create some command line tools. They read the las files from disk using laslib, and essentially what I did on TreeLS was wrapping R/C++ objects back and forth (e.g. xyz matrices) using std::vector since I didn't have to read files from disk anymore. I was aware that this approach, of course, is far from optimal in terms of memory usage, as it brings a lot of overhead.

Your input was perfect in timing, as of v2.0 I included all algorithms and methods I planned since refactoring the package before its 1st CRAN release last year. Now seems like a good time for another deep refactoring, especially targeting the C++ backend. I didn't take the time to properly learn Rcpp, so I just minimized my workload by making those rather inneficient R/C++ wrappers (pretty much the whole r_interface.cpp file).

On top of that, developing TreeLS has been slow since I do it as a side project on my spare time. I'm getting back to the academy as of this year though, and I'll be working with LiDAR again, although it's GEDI data this time, BUT it will be great to bring TLS into the mixture, so I might be able to include TreeLS' development as part of my research. So lets cheer for it!

Well... enough blabber, let's talk business :point_down:

Memory and speed issues

As already said, I am well aware of the memory issues, and this is especially valid for all methods relying on knn, which are very memory hungry. The most important thing on a next code refactor will be replacing those dirty wrappers with proper references/pointers to Rcpp data structures, but also, ideally the knn search should be done entirely in C++. Is there any Rcpp compatible API you recommend for fast knn search?

Regarding ...I don't think your code can handle big TLS datasets..., I'd say those methods get the job done for "normal" forest inventory plots and single trees. They should be able to perform on clouds with up to 30 million points on a 16GB RAM. Of course it can be much better, but for now any really large datasets should be processed in chunks, and most forest inventory plots should be processed just fine so far on computers with at least 8GB of RAM and/or after sampling the point clouds. By the way, most TreeLS methods should be compatible with lidR::catalog, but I haven't explored it, so any remarks you may have on that will be great.

New in lidR 3.1.0 - high-level API

Very interesting. Is this spatial index based on octrees, voxels or something else? Any references you can share?

New in lidR 3.1.0 - low-level API

Awesome!!! Being able to access lidR's functions straight from C++ is a HUGE help. That will also be really helpful for future implementations of even more memory hungry applications, like tree QSMs. Do you have (or will provide) docs for the low-level API?

Conclusion

TreeLS' C++ code sure needs some refurbishing. All your inputs here were really helpful and if you want to submit any PRs you're more than welcome (or anyone else reading this and thinking they might be able to contribute). I can't jump straight into refactoring TreeLS again as I have too much on my plate right now, but I'll try to be slow and steady at least. When do you plan on releasing lidR 3.1? If there are any related features you want a second opinion or extra testing let me know and I'll be glad to help. As soon as I start digging into lidR's code again I'll have questions to throw at you, be sure of that. Thanks again!

Jean-Romain commented 4 years ago

but I'm a bit embarassed you actually read my C++ code

Make it close source! Of course I read (some piece of) your code, I'm following your work from the beginning :wink:.

Is there any Rcpp compatible API you recommend for fast knn search?

Well... mine?!? It is fully optimized for LiDAR data. If you want another I can't help sorry.

Very interesting. Is this spatial index based on octrees, voxels or something else? Any references you can share?

No octree. Historically I used quad trees but I moved to grid-based index. ALS data are evenly spread and quad tree introduce a lot of complexity for nothing. Moving to a grid-based index allowed to get a x2 speed-up on all queries and thus to many algorithms. See changelog v2.2.0. In lidR knn search is faster than any other knn search in R (FANN, RANN, nabor) because it is specialized to ALS. The new extension to TLS is only an extension of the 2D grid-based index to a 3D voxel-based index. I didn't benchmark it yet against nabor but from my early test shown above it looks good.

Any references you can share?

The source code isn't complex and is commented (but could be more commented).

Do you have (or will provide) docs for the low-level API?

Not yet. This is why I said to do not hesitate to ask me. With your feed back the API may change a bit. I'll document it later if I find how to document C++ classes in R. Anyway the code is simple and there is only few public members.

if you want to submit any PRs you're more than welcome

Actually I forked your repos to make a real test but your code was more complex than expected.

When do you plan on releasing lidR 3.1?

Not before Jan 2021.

If there are any related features you want a second opinion or extra testing let me know and I'll be glad to help.

Just use my class (if you want) and if you encounter difficulties or you think there is a missing feature we can discuss. The class is first designed for my own use so having a feed back from a real user may help to improve.

Jean-Romain commented 4 years ago

I just added comments to all public members that can act like a simple source of documentation

tiagodc / TreeLS