r-spatialecology / landscapemetrics

Landscape Metrics for Categorical Map Patterns 🗺️ in R
https://r-spatialecology.github.io/landscapemetrics
GNU General Public License v3.0
230 stars 43 forks source link

Check-out landmetrics code for faster focal.lmetrics #224

Closed bitbacchus closed 1 year ago

bitbacchus commented 3 years ago

Thanks for clarifying and whipping up some example code!

I guess I was more looking for a way of speeding up the function itself, since only using one metric already took a long time. I assume using a different format than raster would not help, since the underlying function is focal and requires a raster input, right?

Strangely enough, I ran the same analysis (3x3 window for effective mesh size) with landmetrics::focal.lmetrics (which seems to be somehow related to this package?) and it finished within 20 minutes. window_lsm(pfti_proj, window = matrix(1, nrow = 3, ncol = 3), what =c("lsm_l_mesh"),progress=TRUE) did not finish after 3 hours...

Originally posted by @lukasbaumbach in https://github.com/r-spatialecology/landscapemetrics/issues/87#issuecomment-786628912

lukasbaumbach commented 3 years ago

Thanks for opening the new issue! Just for reference, this is their Github page: https://github.com/jeffreyevans/landmetrics. It was formerly known as spatialEco. They actually reference and compare to your package at the end of the page :)

mhesselbarth commented 3 years ago

I had a really quick look at the code at it seems they are also "just" using raster::focal as we do.

Just out of curiosity, which metrics did you calculate @lukasbaumbach? Maybe that's where the difference in computational speed come from.

lukasbaumbach commented 3 years ago

I tested "effective mesh size" for both. Here's the two function calls

# load projected raster
pfts <- raster(file)
pfti <- pfts == 7

# landmetrics
lm <- focal.lmetrics(pfti, metric="effective.mesh.size", w=3, land.value=1, bkg = 0, latlong=FALSE)

# landscapemetrics
pfti[pfti==0] <- NA # remove background zeros
lsm <- window_lsm(pfti, window = matrix(1, nrow = 3, ncol = 3), what = c("lsm_l_mesh"))

I checked the raster with check_landscape() to make sure it meets all requirements. Also calculate_lsm with the same metric works fine. So I suspect it's really about the window function...

bitbacchus commented 3 years ago

I tried to get some data, i.e. benchmarked with the built-in landscape raster for both the moving window and the metric alone.

Results: landmetrics is faster in both cases, in moving window 2.5x faster and the metric alone about 1.5x faster

Here is the code I used:

landscape <- reclassify(landscape, cbind(2, NA))
landscape <- reclassify(landscape, cbind(3, NA))

w <- 3

bm1 <- microbenchmark::microbenchmark(
    lm <- focal.lmetrics(landscape, metric="effective.mesh.size", w=w, land.value=1, bkg = NA, latlong=FALSE),
    lsm <- window_lsm(landscape, window = matrix(1, nrow = w, ncol = w), what = c("lsm_l_mesh")),
    times = 10
)
bm1

mean_lm1 <- 929.167 # as in bm1
mean_lsm1 <- 2345.661 # as in bm1
mean_lsm1 / mean_lm1 # 2.5 x faster

bm2 <- microbenchmark::microbenchmark(
    lm_single <-  lmetrics(landscape, metric="effective.mesh.size", land.value=1, bkg = NA, latlong=FALSE),
    lsm_single <- landscapemetrics::lsm_l_mesh(landscape = landscape)
)

bm2

mean_lm2 <- 4.126911 # as in bm2
mean_lsm2 <- 6.295985 # as in bm2
mean_lsm2 / mean_lm2 # 1.5 x faster

lmetrics <- function(x, bkg = 0, land.value = 1,
                     metric = "prop.landscape", latlong = FALSE) {
    if( class(x) != "RasterLayer" ) stop( "x is not a valid raster layer")
    mnames <- c("class","n.patches","total.area",
                "prop.landscape","patch.density","total.edge",
                "edge.density","landscape.shape.index","largest.patch.index",
                "mean.patch.area","sd.patch.area","min.patch.area",
                "max.patch.area","perimeter.area.frac.dim","mean.perim.area.ratio",
                "sd.perim.area.ratio","min.perim.area.ratio","max.perim.area.ratio",
                "mean.shape.index","sd.shape.index","min.shape.index",
                "max.shape.index","mean.frac.dim.index","sd.frac.dim.index",
                "min.frac.dim.index","max.frac.dim.index","total.core.area",
                "prop.landscape.core","mean.patch.core.area","sd.patch.core.area",
                "min.patch.core.area","max.patch.core.area","prop.like.adjacencies",
                "aggregation.index","lanscape.division.index","splitting.index",
                "effective.mesh.size","patch.cohesion.index")
    m.idx <- which( mnames %in% metric )
    if( length(m.idx) < 1 ) stop("Not a valid landscape metric")
    cs = raster::res(x)[1]
    lmetric <- function(v, focal.values = land.value, bkgs = bkg,
                        rcs = cs, latlongs = latlong, lm.idx = m.idx) {
        m <- raster::as.matrix(v)
        m <- ifelse(m != focal.values, bkgs, 1)
        return( as.numeric(ClassStat(m, bkgd = bkgs, cellsize = rcs,
                                     latlon = latlongs))[lm.idx] )
    }
    return( lmetric(v = x, focal.values = land.value, bkgs = bkg, rcs = cs, latlongs = latlong, lm.idx = m.idx) )
}
bitbacchus commented 3 years ago

As the metric is called for each pixel, the difference multiplies. So I also think it's the metric.

mhesselbarth commented 3 years ago

That would have been my guess as well...

bitbacchus commented 3 years ago

Just a quick update:

This turns out to be quite a rabbit hole ;-) I went a fair bit deeper and it seems that our patch finding algorithm is quite a bit slower than theirs. I compared their connected.pixels with our get_patches (3.2x faster) and the underlying C/C++ code (1.8x faster).

I think this is good news because improving connected labelling would speed-up most measures in landscapemetrics.

> tmat <- matrix(tmat, nrow = 250, ncol = 250)
> microbenchmark::microbenchmark(
+     landmetrics::connected.pixels(tmat),
+     landscapemetrics::get_patches(tmat),
+     .Call('ccl',tmat,PACKAGE='landmetrics'),
+     landscapemetrics:::rcpp_ccl(tmat))
Unit: milliseconds
                                        expr      min       lq     mean   median       uq      max neval
         landmetrics::connected.pixels(tmat) 1.413130 1.456568 1.871517 1.488535 1.714600 12.56624   100
         landscapemetrics::get_patches(tmat) 4.566368 4.772208 5.924897 4.836324 5.953252 13.54484   100
 .Call("ccl", tmat, PACKAGE = "landmetrics") 1.458142 1.499114 1.911508 1.545898 1.722005 14.11127   100
           landscapemetrics:::rcpp_ccl(tmat) 2.613139 2.715605 3.233093 2.822551 3.163085  7.73088   100
mhesselbarth commented 3 years ago

They give https://doi.org/10.1016/j.cviu.2003.09.002 as reference for their connected labeling algorithm and I am really sure I have seen this paper before (meaning we looked at this before?)

Yes, increasing the speed of the connected labeling would be really cool, because a lot metrics depend on it. Their algorithm is written in C, right? So we would take the code, and re-implement it in C++? (And obviously, cite it properly...)

bitbacchus commented 3 years ago

I'd say that's the plan. I'll hack some C++ code and you go for quality control?

bitbacchus commented 3 years ago

just a few intermediate ideas and thoughts:

mhesselbarth commented 3 years ago

But if the new rcpp code is faster (?), I would say we start to first update the connected labeling anyhow. Then, we can think about if we want to update the general structure of window_lsm?

mhesselbarth commented 3 years ago

@bitbacchus I kind of lost track of this...is there anything I can do to progress with this?

bitbacchus commented 3 years ago

I kinda lost track, too ;-) I'll look into this next week.

Nowosad commented 3 years ago

Hi @bitbacchus and @mhesselbarth, I am following this discussion (and @bitbacchus progress) closely. My experience is similar to the @bitbacchus comments above - for a moving window problem, a lot of time will be spent not on actual calculations, but rather as an overhead of the R<->C++ connection. That being said - it would be great to see some alpha code that tries to speed things up, and then we could benchmark the code and try to improve it somehow..

kdyson commented 3 years ago

Similarly following the discussion and @bitbacchus' progress closely. We're using this package to quantify landscapes in the Amazon and this is a key bottleneck.

mhesselbarth commented 3 years ago

So the pressure is on 😄

hoyiwan commented 3 years ago

I am following closely on this because I have to use moving-windows all the time. There are focal alternatives these days that allow parallel processing. Is there any plan on trying that route to speed things up?

mhesselbarth commented 3 years ago

There are some plans but we haven't moved on them yet.

Nowosad commented 2 years ago

FYI: One thing I found out today is that there is a quite overload caused by creating tibbles for each metric. The below line of code takes about 2.5 seconds on my laptop vs 0.9 second when I change tibble::tibble() to data.frame() (or data.table::data.table() in the line 74: https://github.com/r-spatialecology/landscapemetrics/blob/acd3fd6dffa1908a8afb3cfaa5341cc50054e1ae/R/lsm_l_ent.R#L74.

library(landscapemetrics)
library(raster)
data(landscape)

mw = window_lsm(landscape, window = matrix(1, nrow = 5, ncol = 5), what = "lsm_l_ent")
mhesselbarth commented 2 years ago

Interesting...That is quite a lot. I would oppose using data.table since that's a slightly different approach than data.frames. But we could get rid of tibble if that increases speed

Nowosad commented 2 years ago

Or we could make tibble optional... Or just return a vector of values from each main functions, and have another layer that creates a data.frame/tibble....

mhesselbarth commented 1 year ago

Will close this cause the specific speed differences are not related to the window function itself but underlying algorithms on metric level. Also, window_lsm() will be re-worked soonish.