r-spatialecology / landscapemetrics

Landscape Metrics for Categorical Map Patterns 🗺️ in R
https://r-spatialecology.github.io/landscapemetrics
GNU General Public License v3.0
227 stars 43 forks source link

Rcpp #19

Closed marcosci closed 5 years ago

marcosci commented 6 years ago

Hey @Nowosad,

we were playing around with your rcpp functions - really handy stuff in there :+1:

Just one weird thing with rcpp_get_coocurrence_matrix:

foo <- function(landscape){

    raster_padded <- pad_raster(landscape, pad_raster_value = -999,
                                pad_raster_cells = 1)

    adjacent_cells <- raster::adjacent(
        raster_padded,
        cells =  seq_len(raster::ncell(raster_padded)),
        directions = 4,
        pairs = TRUE
    )

    tb <- table(raster_padded[adjacent_cells[, 1]],
                raster_padded[adjacent_cells[, 2]])

    return(tb)
}

foo2 <- function(landscape){
    raster_padded <- pad_raster(landscape, pad_raster_value = -999,
                                pad_raster_cells = 1)

    tb <- rcpp_get_coocurrence_matrix(raster::as.matrix(raster_padded), directions = 4)

    return(tb)
}

x <- microbenchmark(
    r_stuff <- foo(podlasie_ccilc), 
    cpp <- foo2(podlasie_ccilc), 
    times = 100
)

x
Unit: seconds
                           expr      min       lq     mean   median       uq      max neval
 r_stuff <- foo(podlasie_ccilc) 2.066591 2.242272 2.429078 2.450698 2.558393 3.056486   100
    cpp <- foo2(podlasie_ccilc) 3.133793 3.462482 3.560858 3.565013 3.705634 3.951005   100

y <- microbenchmark(
    r_stuff <- foo(landscape), 
    cpp <- foo2(landscape), 
    times = 100
)

y
Unit: milliseconds
                      expr       min        lq      mean    median        uq      max neval
 r_stuff <- foo(landscape) 13.405927 14.166557 16.004563 14.811494 16.061616 49.77265   100
    cpp <- foo2(landscape)  2.200139  2.314256  2.839799  2.465439  2.690333 13.05235   100

.... for landscape it's way faster - but not for podlasie. Do you have anything in mind, why that could be?

Nowosad commented 6 years ago

Great question, Marco! I will test it tomorrow.

marcosci commented 6 years ago

Nice work @Nowosad :-)

New benchmark:

> x
Unit: milliseconds
                           expr        min         lq      mean     median        uq       max neval cld
 r_stuff <- foo(podlasie_ccilc) 3607.12295 3907.67668 4058.7611 4013.53287 4169.4011 4826.0062   100   b
    cpp <- foo2(podlasie_ccilc)   80.54347   83.02005  114.4029   94.30997  121.7541  361.9254   100  a 

.... with rcpp_get_coocurrence_matrix2. Will start implementing that soon :-)

Nowosad commented 6 years ago

Timings are good. The results are not (always). That is still a work in progress..

Nowosad commented 6 years ago

It should be fine now. @marcosci , please test it and let me know if we can merge it with master.

library(landscapemetrics)
library(raster)
#> Loading required package: sp
library(microbenchmark)
data("landscape")
data("podlasie_ccilc")
landscape <- pad_raster(landscape, pad_raster_value = -999,
                            pad_raster_cells = 1)
podlasie_ccilc <- pad_raster(podlasie_ccilc, pad_raster_value = -999,
                             pad_raster_cells = 1)

foo <- function(landscape){

    adjacent_cells <- raster::adjacent(
        landscape,
        cells =  seq_len(raster::ncell(landscape)),
        directions = 4,
        pairs = TRUE
    )

    tb <- table(landscape[adjacent_cells[, 1]],
                landscape[adjacent_cells[, 2]])

    tb <- unclass(tb)

    return(tb)
}

foo2 <- function(landscape){
    tb <- landscapemetrics:::rcpp_get_coocurrence_matrix(raster::as.matrix(landscape), directions = 4)

    return(tb)
}

x <- microbenchmark(
    r_stuff <- foo(podlasie_ccilc),
    cpp <- foo2(podlasie_ccilc),
    times = 10
)

x
#> Unit: milliseconds
#>                            expr        min         lq       mean
#>  r_stuff <- foo(podlasie_ccilc) 3930.70718 3998.36992 4183.68346
#>     cpp <- foo2(podlasie_ccilc)   84.92322   90.93474   99.53851
#>      median        uq       max neval
#>  4195.26915 4257.7961 4674.3710    10
#>    98.78565  108.2687  115.6758    10

all.equal(r_stuff, cpp, check.attributes = FALSE)
#> [1] TRUE

Created on 2018-07-09 by the reprex package (v0.2.0).

Nowosad commented 5 years ago

Hey @marcosci , quick question - do we need all of the files in src/? E.g. do we use rem-union-merge.cpp or borders.cpp?

marcosci commented 5 years ago

I thought so, for the connected labeling. But did not look at that for quite some while. We anyways have to fiddle with that, as one can only use the queen's case at the moment for the patch identification.

Are you asking because libs get that big while compiling the package?

Nowosad commented 5 years ago

Yes. Larger libs means also longer compilation times. I am just curious if we need all of it. If not - we should remove it.

marcosci commented 5 years ago

Will dive into that tomorrow :) While doing that - what are we planing with your rcpp functions, internal or public?

Nowosad commented 5 years ago

I do not have a good answer to that question. It should depends on the users expectations - would they need any of the rcpp functions for a landscape ecology work?

marcosci commented 5 years ago

There were for us - so I would assume for others too. I think they have potential to be building blocks for other metrics, and something in the line of:

get_composition <- function(landscape){

    rcpp_get_composition_vector(raster::as.matrix(landscape))

}

... exported would be valuable. If you think there is a more appropriate place for them, tell me. :-)

marcosci commented 5 years ago

So, Max and I played around with the connected labeling from SDMTools - turns out it is a bit faster and we could also tweak it to work 4 neighbors, instead of only 8.

They use it under a GPL license, I think if we put them as the author of the C function it should be fine to not have SDMTools as a dependency?

Nowosad commented 5 years ago

That is my understanding as well. See rmarkdown as an example - https://github.com/rstudio/rmarkdown/blob/master/DESCRIPTION (it uses outside code).

Edit: interesting disussion at https://github.com/ropensci/unconf17/issues/32

marcosci commented 5 years ago

According to the figure @cboettig is showing it would be allowed - nice.

Then, after now restructuring ~200 functions to make them understand the argument directions, we should write a proper acknowledgmenent in the C code :+1:

Nowosad commented 5 years ago

@marcosci, going back to your previous question - I agree that this function could be useful for the users. We just need to come with a better name... Is there any other rcpp function that could be useful externally?

marcosci commented 5 years ago

Hm, I would export them all (with better names :yum: ) except for maybe:

?

marcosci commented 5 years ago

extract_xxx isn't any better, isn't?

:thinking:

mhesselbarth commented 5 years ago

I would prefer an identical prefix to group all of them. How about using get_xxx, so just remove the rcpp_ part for the name of the R wrapper.

Functions that may be interesting are:

Nowosad commented 5 years ago

Marco and Max, sorry for the delay with this. I'd been at a conference whole last week, and this week is also crazy. I will think about it and make it done in a week or so (or tomorrow).

marcosci commented 5 years ago

Just started to implement the utility functions - nothing set in stone, if you want to change/add/remove/whatever something that is totally fine :+1:

Nowosad commented 5 years ago

Awesome work! One small idea - how about using our lsm_ prefix here as well? E.g. lsm_get_xx()?

mhesselbarth commented 5 years ago

We had the idea to use lsm_ as a prefix only for functions that actually calculate a metric and not for auxiliary or helper functions

Nowosad commented 5 years ago

Ok, that's fine.