Check-out landmetrics code for faster focal.lmetrics

bitbacchus commented 3 years ago

Thanks for clarifying and whipping up some example code!

I guess I was more looking for a way of speeding up the function itself, since only using one metric already took a long time. I assume using a different format than raster would not help, since the underlying function is focal and requires a raster input, right?

Strangely enough, I ran the same analysis (3x3 window for effective mesh size) with landmetrics::focal.lmetrics (which seems to be somehow related to this package?) and it finished within 20 minutes. window_lsm(pfti_proj, window = matrix(1, nrow = 3, ncol = 3), what =c("lsm_l_mesh"),progress=TRUE) did not finish after 3 hours...

Originally posted by @lukasbaumbach in https://github.com/r-spatialecology/landscapemetrics/issues/87#issuecomment-786628912

lukasbaumbach commented 3 years ago

Thanks for opening the new issue! Just for reference, this is their Github page: https://github.com/jeffreyevans/landmetrics. It was formerly known as spatialEco. They actually reference and compare to your package at the end of the page :)

mhesselbarth commented 3 years ago

I had a really quick look at the code at it seems they are also "just" using raster::focal as we do.

Just out of curiosity, which metrics did you calculate @lukasbaumbach? Maybe that's where the difference in computational speed come from.

lukasbaumbach commented 3 years ago

I tested "effective mesh size" for both. Here's the two function calls

# load projected raster
pfts <- raster(file)
pfti <- pfts == 7

# landmetrics
lm <- focal.lmetrics(pfti, metric="effective.mesh.size", w=3, land.value=1, bkg = 0, latlong=FALSE)

# landscapemetrics
pfti[pfti==0] <- NA # remove background zeros
lsm <- window_lsm(pfti, window = matrix(1, nrow = 3, ncol = 3), what = c("lsm_l_mesh"))

I checked the raster with check_landscape() to make sure it meets all requirements. Also calculate_lsm with the same metric works fine. So I suspect it's really about the window function...

bitbacchus commented 3 years ago

I tried to get some data, i.e. benchmarked with the built-in landscape raster for both the moving window and the metric alone.

Results: landmetrics is faster in both cases, in moving window 2.5x faster and the metric alone about 1.5x faster

Here is the code I used:

landscape <- reclassify(landscape, cbind(2, NA))
landscape <- reclassify(landscape, cbind(3, NA))

w <- 3

bm1 <- microbenchmark::microbenchmark(
    lm <- focal.lmetrics(landscape, metric="effective.mesh.size", w=w, land.value=1, bkg = NA, latlong=FALSE),
    lsm <- window_lsm(landscape, window = matrix(1, nrow = w, ncol = w), what = c("lsm_l_mesh")),
    times = 10
)
bm1

mean_lm1 <- 929.167 # as in bm1
mean_lsm1 <- 2345.661 # as in bm1
mean_lsm1 / mean_lm1 # 2.5 x faster

bm2 <- microbenchmark::microbenchmark(
    lm_single <-  lmetrics(landscape, metric="effective.mesh.size", land.value=1, bkg = NA, latlong=FALSE),
    lsm_single <- landscapemetrics::lsm_l_mesh(landscape = landscape)
)

bm2

mean_lm2 <- 4.126911 # as in bm2
mean_lsm2 <- 6.295985 # as in bm2
mean_lsm2 / mean_lm2 # 1.5 x faster

lmetrics <- function(x, bkg = 0, land.value = 1,
                     metric = "prop.landscape", latlong = FALSE) {
    if( class(x) != "RasterLayer" ) stop( "x is not a valid raster layer")
    mnames <- c("class","n.patches","total.area",
                "prop.landscape","patch.density","total.edge",
                "edge.density","landscape.shape.index","largest.patch.index",
                "mean.patch.area","sd.patch.area","min.patch.area",
                "max.patch.area","perimeter.area.frac.dim","mean.perim.area.ratio",
                "sd.perim.area.ratio","min.perim.area.ratio","max.perim.area.ratio",
                "mean.shape.index","sd.shape.index","min.shape.index",
                "max.shape.index","mean.frac.dim.index","sd.frac.dim.index",
                "min.frac.dim.index","max.frac.dim.index","total.core.area",
                "prop.landscape.core","mean.patch.core.area","sd.patch.core.area",
                "min.patch.core.area","max.patch.core.area","prop.like.adjacencies",
                "aggregation.index","lanscape.division.index","splitting.index",
                "effective.mesh.size","patch.cohesion.index")
    m.idx <- which( mnames %in% metric )
    if( length(m.idx) < 1 ) stop("Not a valid landscape metric")
    cs = raster::res(x)[1]
    lmetric <- function(v, focal.values = land.value, bkgs = bkg,
                        rcs = cs, latlongs = latlong, lm.idx = m.idx) {
        m <- raster::as.matrix(v)
        m <- ifelse(m != focal.values, bkgs, 1)
        return( as.numeric(ClassStat(m, bkgd = bkgs, cellsize = rcs,
                                     latlon = latlongs))[lm.idx] )
    }
    return( lmetric(v = x, focal.values = land.value, bkgs = bkg, rcs = cs, latlongs = latlong, lm.idx = m.idx) )
}

bitbacchus commented 3 years ago

As the metric is called for each pixel, the difference multiplies. So I also think it's the metric.

mhesselbarth commented 3 years ago

That would have been my guess as well...

bitbacchus commented 3 years ago

Just a quick update:

This turns out to be quite a rabbit hole ;-) I went a fair bit deeper and it seems that our patch finding algorithm is quite a bit slower than theirs. I compared their connected.pixels with our get_patches (3.2x faster) and the underlying C/C++ code (1.8x faster).

I think this is good news because improving connected labelling would speed-up most measures in landscapemetrics.

> tmat <- matrix(tmat, nrow = 250, ncol = 250)
> microbenchmark::microbenchmark(
+     landmetrics::connected.pixels(tmat),
+     landscapemetrics::get_patches(tmat),
+     .Call('ccl',tmat,PACKAGE='landmetrics'),
+     landscapemetrics:::rcpp_ccl(tmat))
Unit: milliseconds
                                        expr      min       lq     mean   median       uq      max neval
         landmetrics::connected.pixels(tmat) 1.413130 1.456568 1.871517 1.488535 1.714600 12.56624   100
         landscapemetrics::get_patches(tmat) 4.566368 4.772208 5.924897 4.836324 5.953252 13.54484   100
 .Call("ccl", tmat, PACKAGE = "landmetrics") 1.458142 1.499114 1.911508 1.545898 1.722005 14.11127   100
           landscapemetrics:::rcpp_ccl(tmat) 2.613139 2.715605 3.233093 2.822551 3.163085  7.73088   100

mhesselbarth commented 3 years ago

They give https://doi.org/10.1016/j.cviu.2003.09.002 as reference for their connected labeling algorithm and I am really sure I have seen this paper before (meaning we looked at this before?)

Yes, increasing the speed of the connected labeling would be really cool, because a lot metrics depend on it. Their algorithm is written in C, right? So we would take the code, and re-implement it in C++? (And obviously, cite it properly...)

bitbacchus commented 3 years ago

I'd say that's the plan. I'll hack some C++ code and you go for quality control?

bitbacchus commented 3 years ago

just a few intermediate ideas and thoughts:

the new rcpp-code is half finished, I might come to push it on the weekend
R seems to be inherently not well-suited for this kind of moving-window applications, because executing functions in R is rather expensive and here the corresponding functions have to be executed for every single cell of the landscape - tens of millions in doubt
the function call in landscapemetrics is in principle more expensive than in landmetrics, because our code is more structured (= more sub-functions) and we use S3 methods, which again costs time.
In moving window applications, the runtime of the actual algorithm plays a relatively minor role (a landscape in each window is tiny), but the costs of the overhead for function calls weigh heavily.
landmetrics will therefore almost certainly always be faster
so if we want to have moving window really fast, we'd have to
- pass the actual algorithms as directly as possible to focal and figure out the method in advance
- if available, call the C/C++ code directly and avoid R-functions if possible
- implement our custom focal, preferably parallelised (which would totally make sense)

mhesselbarth commented 3 years ago

But if the new rcpp code is faster (?), I would say we start to first update the connected labeling anyhow. Then, we can think about if we want to update the general structure of window_lsm?

mhesselbarth commented 3 years ago

@bitbacchus I kind of lost track of this...is there anything I can do to progress with this?

bitbacchus commented 3 years ago

I kinda lost track, too ;-) I'll look into this next week.

Nowosad commented 3 years ago

Hi @bitbacchus and @mhesselbarth, I am following this discussion (and @bitbacchus progress) closely. My experience is similar to the @bitbacchus comments above - for a moving window problem, a lot of time will be spent not on actual calculations, but rather as an overhead of the R<->C++ connection. That being said - it would be great to see some alpha code that tries to speed things up, and then we could benchmark the code and try to improve it somehow..

kdyson commented 3 years ago

Similarly following the discussion and @bitbacchus' progress closely. We're using this package to quantify landscapes in the Amazon and this is a key bottleneck.

mhesselbarth commented 3 years ago

So the pressure is on 😄

hoyiwan commented 3 years ago

I am following closely on this because I have to use moving-windows all the time. There are focal alternatives these days that allow parallel processing. Is there any plan on trying that route to speed things up?

mhesselbarth commented 3 years ago

There are some plans but we haven't moved on them yet.

Nowosad commented 2 years ago

FYI: One thing I found out today is that there is a quite overload caused by creating tibbles for each metric. The below line of code takes about 2.5 seconds on my laptop vs 0.9 second when I change tibble::tibble() to data.frame() (or data.table::data.table() in the line 74: https://github.com/r-spatialecology/landscapemetrics/blob/acd3fd6dffa1908a8afb3cfaa5341cc50054e1ae/R/lsm_l_ent.R#L74.

library(landscapemetrics)
library(raster)
data(landscape)

mw = window_lsm(landscape, window = matrix(1, nrow = 5, ncol = 5), what = "lsm_l_ent")

mhesselbarth commented 2 years ago

Interesting...That is quite a lot. I would oppose using data.table since that's a slightly different approach than data.frames. But we could get rid of tibble if that increases speed

Nowosad commented 2 years ago

Or we could make tibble optional... Or just return a vector of values from each main functions, and have another layer that creates a data.frame/tibble....

mhesselbarth commented 1 year ago

Will close this cause the specific speed differences are not related to the window function itself but underlying algorithms on metric level. Also, window_lsm() will be re-worked soonish.

r-spatialecology / landscapemetrics

Check-out landmetrics code for faster focal.lmetrics #224