jeffreyevans / spatialEco

R package for spatial analysis and modelling of ecological systems
GNU General Public License v3.0
108 stars 29 forks source link

Bandwidth argument is ignored by ks::kde in sf.kde #59

Closed fbled closed 1 year ago

fbled commented 1 year ago

Hello Jeffrey!

First of all, thanks a lot for the amazing work you've put in Spatial Eco, this is an awesome package, and I love using it! I have recently had to use it to compute some Gaussian Kernel Density estimate for point spatial data with sf.kde, and I have noticed an issue in the way it calls ks::kde for the actual computation. As of now, because of the way sf.kde calls ks::kde, ks::kde completely ignores the bandwidth argument. Currently, sf.kde calls ks::kde this way (formated and commented for clarity):


if (!is.null(y)) {
        message("\n", "calculating weighted kde", "\n")
        kde.est <- suppressWarnings(terra::rast(matrix(ks::kde(sf::st_coordinates(x)[,1:2], 
            h = bw,              # Note how the bandwidth argument is sent to ks::kde
           eval.points = terra::xyFromCell(ref, 1:terra::ncell(ref)), gridsize = n, w = y, density = TRUE)$estimate,  nrow = n[1], ncol = n[2], byrow = TRUE), extent = terra::ext(ref)))
    }     else {
        message("\n", "calculating unweighted kde", "\n")
        kde.est <- suppressWarnings(terra::rast(matrix(ks::kde(sf::st_coordinates(x)[,1:2], 
            h = bw,             # Same comment as above
           eval.points = terra::xyFromCell(ref, 1:terra::ncell(ref)), gridsize = n, density = TRUE)$estimate, nrow = n[1], ncol = n[2], byrow = TRUE), extent = terra::ext(ref)))

It doesn't really matter whether or not we're using weighted or unweighted Gaussian Kernel Density estimates, the issue is the same. I believe it comes from the fact that the call specifies the bandwidth using the h = bw argument. However, ks::kde only uses h for one-dimensional data. When dealing with 2+ dimensional data, it relies on the argument H (a bandwidth matrix), and ignores any h argument. If H is not specified, Hpi is called by default.

This issue could be resolved in a quick and dirty way by changing the sf.kde code with something along the lines of:

if (!is.null(y)) {
        message("\n", "calculating weighted kde", "\n")
        kde.est <- suppressWarnings(terra::rast(matrix(ks::kde(sf::st_coordinates(x)[,1:2], 
            H = diag(bw,2),              # Changing h to H, and sending a bandwidth matrix, instead of a scalar
           eval.points = terra::xyFromCell(ref, 1:terra::ncell(ref)), gridsize = n, w = y, density = TRUE)$estimate,  nrow = n[1], ncol = n[2], byrow = TRUE), extent = terra::ext(ref)))
    }     else {
        message("\n", "calculating unweighted kde", "\n")
        kde.est <- suppressWarnings(terra::rast(matrix(ks::kde(sf::st_coordinates(x)[,1:2], 
            H = diag(bw,2),             # Same comment as above
           eval.points = terra::xyFromCell(ref, 1:terra::ncell(ref)), gridsize = n, density = TRUE)$estimate, nrow = n[1], ncol = n[2], byrow = TRUE), extent = terra::ext(ref)))

or using any more appropriate way to compute (or specify) the 2-dimensional bandwidth matrix based on the method of interest.

It would be interesting to either correct the code to force the user to input a 2-dimensional bandwidth matrix in sf.kde that is then directly fed in ks::kde using the appropriate argument call (H), or have sf.kde create said 2-dimensional bandwidth matrix from the user-specified bw argument.

Again, thanks for looking into it, and for the great work you put in this package!

fbled commented 1 year ago

I am proposing the following correction to sf.kde which implements the initial proposition above, and also corrects the computation of the unweighted automatic bandwidth matrix when no weights nor bandwidth arguments are present. The current version of the code uses incorrectly ks::hpi to do so. This produces an error, that is ignored because hpi does not handle dimensions >1 (leading to a missing bw, but it does not matter because as mentioned in the previous post, ks::kde (silently!) does not recognize the argument h = bw for dimensions>1, and (silently!) applies Hpi to the dataset.

Proposed solution is then:

sf.kde <- function (x, y = NULL, bw = NULL, ref = NULL, res = NULL, standardize = FALSE,
    scale.factor = NULL, mask = FALSE) 
{
    if (missing(x))
        stop("x argument must be provided")
    ref.flag = inherits(ref, "SpatRaster")
    if (!inherits(x, c("sf", "sfc"))) 
        stop(deparse(substitute(x)), " must be a sf, or sfc object")
    if (unique(as.character(sf::st_geometry_type(x))) != "POINT") 
        stop(deparse(substitute(x)), " must be single-part POINT geometry")
    if (is.null(ref)) {
        if (!is.null(res)) {
            ref <- terra::rast(terra::ext(x), resolution = res)
        }
        else {
            ref <- terra::rast(terra::ext(x))
            message("defaulting to ", terra::res(ref)[1], "x", 
                terra::res(ref)[2], " cell resolution")
        }
    }
    if (inherits(ref, "numeric")) {
        if (length(ref) != 4) 
            stop("Need xmin, xmax, ymin, ymax bounding coordinates")
        if (!is.null(res)) {
            ref <- terra::rast(terra::ext(ref), resolution = res)
        }
        else {
            ref <- terra::rast(terra::ext(ref))
            message("defaulting to ", terra::res(ref)[1], "x",
                terra::res(ref)[2], " cell resolution")
        }
    }
    else {
        if (!inherits(ref, "SpatRaster")) 
            stop(deparse(substitute(ref)), " must be a terra SpatRast object")
    }
    n <- c(terra::nrow(ref), terra::ncol(ref))

    if (is.null(bw)) {
        if (is.null(y)) {
            bw = suppressWarnings(ks::Hpi(sf::st_coordinates(x)[,
                1:2]))
            message("Unweighted automatic bandwidth: ")
            print(bw)
        }
        else {
            bw = suppressWarnings(ks::Hscv.diag(cbind(sf::st_coordinates(x)[, 
                1:2], y)))
            message("Weighted automatic CV bandwidth: ")
            print(bw)
        }
    }

    else {
        message("Using specified bandwidth: ")
        print(bw)
        if(is.atomic(bw) && length(bw) == 1L) {  bw=diag(bw,2) }
    }

    if (!is.null(y)) {
        message("\n", "calculating weighted kde", "\n")
        kde.est <- suppressWarnings(terra::rast(matrix(ks::kde(sf::st_coordinates(x)[, 
            1:2], H = bw, eval.points = terra::xyFromCell(ref,
            1:terra::ncell(ref)), gridsize = n, w = y, density = TRUE)$estimate, 
            nrow = n[1], ncol = n[2], byrow = TRUE), extent = terra::ext(ref)))
    }

    else {
        message("\n", "calculating unweighted kde", "\n")
        kde.est <- suppressWarnings(terra::rast(matrix(ks::kde(sf::st_coordinates(x)[,
            1:2], H = bw, eval.points = terra::xyFromCell(ref, 
            1:terra::ncell(ref)), gridsize = n, density = TRUE)$estimate, 
            nrow = n[1], ncol = n[2], byrow = TRUE), extent = terra::ext(ref)))
    }
    if (!is.null(scale.factor)) 
        kde.est <- kde.est * scale.factor
    if (standardize == TRUE) {
        kde.est <- kde.est/terra::global(kde.est, "max", na.rm = TRUE)[, 
            1]
    }

    if (mask) {
        if (!ref.flag) {
            message("Since a raster was not used as ref, there is nothing to mask")
        }
        else {
            kde.est <- terra::mask(kde.est, ref)
        }
    }
    terra::crs(kde.est) <- terra::crs(x)
    return(kde.est)
}
jeffreyevans commented 1 year ago

I reverted the function to a Gaussian estimate. The multivariate estimate implemented in ks is questionable with spatial data. The bandwidth bug is also fixed.