Closed bitbacchus closed 1 year ago
Thanks for opening the new issue! Just for reference, this is their Github page: https://github.com/jeffreyevans/landmetrics. It was formerly known as spatialEco
. They actually reference and compare to your package at the end of the page :)
I had a really quick look at the code at it seems they are also "just" using raster::focal
as we do.
Just out of curiosity, which metrics did you calculate @lukasbaumbach? Maybe that's where the difference in computational speed come from.
I tested "effective mesh size" for both. Here's the two function calls
# load projected raster
pfts <- raster(file)
pfti <- pfts == 7
# landmetrics
lm <- focal.lmetrics(pfti, metric="effective.mesh.size", w=3, land.value=1, bkg = 0, latlong=FALSE)
# landscapemetrics
pfti[pfti==0] <- NA # remove background zeros
lsm <- window_lsm(pfti, window = matrix(1, nrow = 3, ncol = 3), what = c("lsm_l_mesh"))
I checked the raster with check_landscape()
to make sure it meets all requirements. Also calculate_lsm
with the same metric works fine. So I suspect it's really about the window function...
I tried to get some data, i.e. benchmarked with the built-in landscape
raster for both the moving window and the metric alone.
Results: landmetrics
is faster in both cases, in moving window 2.5x faster and the metric alone about 1.5x faster
Here is the code I used:
landscape <- reclassify(landscape, cbind(2, NA))
landscape <- reclassify(landscape, cbind(3, NA))
w <- 3
bm1 <- microbenchmark::microbenchmark(
lm <- focal.lmetrics(landscape, metric="effective.mesh.size", w=w, land.value=1, bkg = NA, latlong=FALSE),
lsm <- window_lsm(landscape, window = matrix(1, nrow = w, ncol = w), what = c("lsm_l_mesh")),
times = 10
)
bm1
mean_lm1 <- 929.167 # as in bm1
mean_lsm1 <- 2345.661 # as in bm1
mean_lsm1 / mean_lm1 # 2.5 x faster
bm2 <- microbenchmark::microbenchmark(
lm_single <- lmetrics(landscape, metric="effective.mesh.size", land.value=1, bkg = NA, latlong=FALSE),
lsm_single <- landscapemetrics::lsm_l_mesh(landscape = landscape)
)
bm2
mean_lm2 <- 4.126911 # as in bm2
mean_lsm2 <- 6.295985 # as in bm2
mean_lsm2 / mean_lm2 # 1.5 x faster
lmetrics <- function(x, bkg = 0, land.value = 1,
metric = "prop.landscape", latlong = FALSE) {
if( class(x) != "RasterLayer" ) stop( "x is not a valid raster layer")
mnames <- c("class","n.patches","total.area",
"prop.landscape","patch.density","total.edge",
"edge.density","landscape.shape.index","largest.patch.index",
"mean.patch.area","sd.patch.area","min.patch.area",
"max.patch.area","perimeter.area.frac.dim","mean.perim.area.ratio",
"sd.perim.area.ratio","min.perim.area.ratio","max.perim.area.ratio",
"mean.shape.index","sd.shape.index","min.shape.index",
"max.shape.index","mean.frac.dim.index","sd.frac.dim.index",
"min.frac.dim.index","max.frac.dim.index","total.core.area",
"prop.landscape.core","mean.patch.core.area","sd.patch.core.area",
"min.patch.core.area","max.patch.core.area","prop.like.adjacencies",
"aggregation.index","lanscape.division.index","splitting.index",
"effective.mesh.size","patch.cohesion.index")
m.idx <- which( mnames %in% metric )
if( length(m.idx) < 1 ) stop("Not a valid landscape metric")
cs = raster::res(x)[1]
lmetric <- function(v, focal.values = land.value, bkgs = bkg,
rcs = cs, latlongs = latlong, lm.idx = m.idx) {
m <- raster::as.matrix(v)
m <- ifelse(m != focal.values, bkgs, 1)
return( as.numeric(ClassStat(m, bkgd = bkgs, cellsize = rcs,
latlon = latlongs))[lm.idx] )
}
return( lmetric(v = x, focal.values = land.value, bkgs = bkg, rcs = cs, latlongs = latlong, lm.idx = m.idx) )
}
As the metric is called for each pixel, the difference multiplies. So I also think it's the metric.
That would have been my guess as well...
Just a quick update:
This turns out to be quite a rabbit hole ;-) I went a fair bit deeper and it seems that our patch finding algorithm is quite a bit slower than theirs. I compared their connected.pixels
with our get_patches
(3.2x faster) and the underlying C/C++ code (1.8x faster).
I think this is good news because improving connected labelling would speed-up most measures in landscapemetrics
.
> tmat <- matrix(tmat, nrow = 250, ncol = 250)
> microbenchmark::microbenchmark(
+ landmetrics::connected.pixels(tmat),
+ landscapemetrics::get_patches(tmat),
+ .Call('ccl',tmat,PACKAGE='landmetrics'),
+ landscapemetrics:::rcpp_ccl(tmat))
Unit: milliseconds
expr min lq mean median uq max neval
landmetrics::connected.pixels(tmat) 1.413130 1.456568 1.871517 1.488535 1.714600 12.56624 100
landscapemetrics::get_patches(tmat) 4.566368 4.772208 5.924897 4.836324 5.953252 13.54484 100
.Call("ccl", tmat, PACKAGE = "landmetrics") 1.458142 1.499114 1.911508 1.545898 1.722005 14.11127 100
landscapemetrics:::rcpp_ccl(tmat) 2.613139 2.715605 3.233093 2.822551 3.163085 7.73088 100
They give https://doi.org/10.1016/j.cviu.2003.09.002 as reference for their connected labeling algorithm and I am really sure I have seen this paper before (meaning we looked at this before?)
Yes, increasing the speed of the connected labeling would be really cool, because a lot metrics depend on it. Their algorithm is written in C, right? So we would take the code, and re-implement it in C++? (And obviously, cite it properly...)
I'd say that's the plan. I'll hack some C++ code and you go for quality control?
just a few intermediate ideas and thoughts:
landscapemetrics
is in principle more expensive than in landmetrics
, because our code is more structured (= more sub-functions) and we use S3 methods, which again costs time. landmetrics
will therefore almost certainly always be fasterfocal
and figure out the method in advancefocal
, preferably parallelised (which would totally make sense)But if the new rcpp code is faster (?), I would say we start to first update the connected labeling anyhow. Then, we can think about if we want to update the general structure of window_lsm
?
@bitbacchus I kind of lost track of this...is there anything I can do to progress with this?
I kinda lost track, too ;-) I'll look into this next week.
Hi @bitbacchus and @mhesselbarth, I am following this discussion (and @bitbacchus progress) closely. My experience is similar to the @bitbacchus comments above - for a moving window problem, a lot of time will be spent not on actual calculations, but rather as an overhead of the R<->C++ connection. That being said - it would be great to see some alpha code that tries to speed things up, and then we could benchmark the code and try to improve it somehow..
Similarly following the discussion and @bitbacchus' progress closely. We're using this package to quantify landscapes in the Amazon and this is a key bottleneck.
So the pressure is on 😄
I am following closely on this because I have to use moving-windows all the time. There are focal alternatives these days that allow parallel processing. Is there any plan on trying that route to speed things up?
There are some plans but we haven't moved on them yet.
FYI: One thing I found out today is that there is a quite overload caused by creating tibbles for each metric. The below line of code takes about 2.5 seconds on my laptop vs 0.9 second when I change tibble::tibble()
to data.frame()
(or data.table::data.table()
in the line 74: https://github.com/r-spatialecology/landscapemetrics/blob/acd3fd6dffa1908a8afb3cfaa5341cc50054e1ae/R/lsm_l_ent.R#L74.
library(landscapemetrics)
library(raster)
data(landscape)
mw = window_lsm(landscape, window = matrix(1, nrow = 5, ncol = 5), what = "lsm_l_ent")
Interesting...That is quite a lot. I would oppose using data.table
since that's a slightly different approach than data.frames
. But we could get rid of tibble
if that increases speed
Or we could make tibble optional... Or just return a vector of values from each main functions, and have another layer that creates a data.frame/tibble....
Will close this cause the specific speed differences are not related to the window function itself but underlying algorithms on metric level. Also, window_lsm()
will be re-worked soonish.
Thanks for clarifying and whipping up some example code!
I guess I was more looking for a way of speeding up the function itself, since only using one metric already took a long time. I assume using a different format than raster would not help, since the underlying function is
focal
and requires a raster input, right?Strangely enough, I ran the same analysis (3x3 window for effective mesh size) with
landmetrics::focal.lmetrics
(which seems to be somehow related to this package?) and it finished within 20 minutes.window_lsm(pfti_proj, window = matrix(1, nrow = 3, ncol = 3), what =c("lsm_l_mesh"),progress=TRUE)
did not finish after 3 hours...Originally posted by @lukasbaumbach in https://github.com/r-spatialecology/landscapemetrics/issues/87#issuecomment-786628912