inlabru-org / inlabru

inlabru
https://inlabru-org.github.io/inlabru/
73 stars 21 forks source link

crs mismatch in gorillas_sf #205

Closed dmi3kno closed 9 months ago

dmi3kno commented 10 months ago

There's a small crs mismatch in gorillas_sf, namely in counts object

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(inlabru)
#> Loading required package: fmesher
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)
data(gorillas_sf)

sample <- gorillas_sf$plotsample

all.equal(st_crs(sample$counts), st_crs(sample$plots))
#> [1] "Component \"input\": 1 string mismatch"
#> [2] "Component \"wkt\": 1 string mismatch"
sample$counts <- st_set_crs(sample$counts, st_crs(sample$plots))
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that
all.equal(st_crs(sample$counts), st_crs(sample$plots))
#> [1] TRUE

Created on 2023-09-11 with reprex v2.0.2

finnlindgren commented 10 months ago

I hadn't noticed that, but this is also the case for the sp version, gorillas. It's an incorrect units specification that's the issue.

> st_crs(gorillas_sf$plotsample$counts)$input
[1] "+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs"
> st_crs(gorillas_sf$plotsample$plots)$input
[1] "+proj=utm +zone=32 +datum=WGS84 +units=km +no_defs"
> fm_crs(gorillas$plotsample$counts)$proj4string
[1] "+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs"
> fm_crs(gorillas$plotsample$nests)$proj4string
[1] "+proj=utm +zone=32 +datum=WGS84 +units=km +no_defs"
> head(gorillas_sf$plotsample$nests)
Simple feature collection with 6 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 581.823 ymin: 677.0166 xmax: 583.6859 ymax: 677.5097
Projected CRS: +proj=utm +zone=32 +datum=WGS84 +units=km +no_defs
   group season       date                  geometry
2  major    dry 2006-01-10  POINT (581.823 677.4227)
5  minor    dry 2006-01-27 POINT (582.5851 677.5097)
8  major    dry 2006-02-03 POINT (583.5845 677.2071)
12 major    dry 2006-02-23 POINT (582.6159 677.5006)
18 major    dry 2006-03-17 POINT (583.6859 677.2349)
19 major    dry 2006-03-19 POINT (583.4742 677.0166)
> head(gorillas_sf$plotsample$counts)
Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 580.7037 ymin: 674.7238 xmax: 584.6149 ymax: 675.5694
Projected CRS: +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
  count  exposure                  geometry
1     0 0.2030052 POINT (581.5726 674.7681)
2     0 0.3087156 POINT (582.5867 674.7238)
3     0 0.3087156 POINT (583.6008 674.7238)
4     0 0.3082937 POINT (584.6149 674.7238)
5     0 0.1142164 POINT (580.7037 675.5694)
6     0 0.3087156 POINT (581.5726 675.5694)

The two objects clearly are both using the same units, and both crs:es should have +units=km. I think this has gone undetected for so long due to how the counts data is typically used, by relying on the ordering instead of the coordinates, and that older INLA/inlabru code didn't always check that crs:es were identical when they should have been. The new inlabru/fmesher code transforms to common crs in more cases.

finnlindgren commented 10 months ago

I'm also not sure why there is a separate counts object. The count and exposure variables could have been placed in plots instead, but for some reason the old code for constructing the plotsample version of the data split the information in two objects.

dmi3kno commented 10 months ago

I think it is because 'counts' stores centroids for labeling the plots.

finnlindgren commented 10 months ago

You're probably right, but nowadays that should be handled by the plotting methods instead (and/or extracted dynamically from the polygons).

finnlindgren commented 10 months ago

With sf, one can even have more than one column with geometric info, if one really needs a dedicated variable for the centroids. That wouldn't be automatically transformed under crs transformations though.

finnlindgren commented 10 months ago

I'm rebuilding the gorillas and gorillas_sf objects now.