paleolimbot / wk

Lightweight Well-Known Geometry Parsing
https://paleolimbot.github.io/wk
Other
45 stars 6 forks source link

Handle `xy(NA, NA)` as null instead of empty #205

Closed paleolimbot closed 11 months ago

paleolimbot commented 11 months ago

@anthonynorth do you forsee problems with this? I would like to preserve the invariant that a "null feature" lines up exactly with is.na(x).

Before:

library(wk)

wk_debug(xy(NA, NA))
#> initialize (dirty = 0  -> 1)
#> vector_start: POINT[1] <0x16d26f380> => WK_CONTINUE
#>   feature_start (1): <0x16d26f380>  => WK_CONTINUE
#>     geometry_start (<none>): POINT B[EMPTY] <0x16d26f328> => WK_CONTINUE
#>     geometry_end (<none>)  => WK_CONTINUE
#>   feature_end (1): <0x16d26f380>  => WK_CONTINUE
#> vector_end: <0x16d26f380>
#> deinitialize
#> NULL
is.na(xy(NA, NA))
#> [1] TRUE

Created on 2023-10-15 with reprex v2.0.2

After:

library(wk)

wk_debug(xy(NA, NA))
#> initialize (dirty = 0  -> 1)
#> vector_start: POINT[1] <0x16dc0b260> => WK_CONTINUE
#>   feature_start (1): <0x16dc0b260>  => WK_CONTINUE
#>     null_feature  => WK_CONTINUE
#>   feature_end (1): <0x16dc0b260>  => WK_CONTINUE
#> vector_end: <0x16dc0b260>
#> deinitialize
#> NULL
is.na(xy(NA, NA))
#> [1] TRUE

Created on 2023-10-15 with reprex v2.0.2

codecov-commenter commented 11 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (8ab32e7) 98.85% compared to head (c4753fc) 98.85%. Report is 2 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #205 +/- ## ======================================= Coverage 98.85% 98.85% ======================================= Files 85 85 Lines 6197 6207 +10 ======================================= + Hits 6126 6136 +10 Misses 71 71 ``` | [Files](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington) | Coverage Δ | | |---|---|---| | [R/pkg-sf.R](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-Ui9wa2ctc2YuUg==) | `99.33% <100.00%> (+0.01%)` | :arrow_up: | | [R/utils.R](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-Ui91dGlscy5S) | `100.00% <ø> (ø)` | | | [R/vertex-filter.R](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-Ui92ZXJ0ZXgtZmlsdGVyLlI=) | `100.00% <100.00%> (ø)` | | | [R/xyzm.R](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-Ui94eXptLlI=) | `100.00% <100.00%> (ø)` | | | [src/handle-xy.c](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-c3JjL2hhbmRsZS14eS5j) | `97.61% <100.00%> (+0.05%)` | :arrow_up: | | [src/vertex-filter.c](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-c3JjL3ZlcnRleC1maWx0ZXIuYw==) | `99.27% <ø> (-0.03%)` | :arrow_down: | | [src/xy-writer.c](https://app.codecov.io/gh/paleolimbot/wk/pull/205?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Dewey+Dunnington#diff-c3JjL3h5LXdyaXRlci5j) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

anthonynorth commented 11 months ago

xy and sfc_POINT will no longer be equivalent, but that's by design. Documenting this should be enough.

Do we need an is.nan() method for xy vectors?

We'll need to check over the codebase for consistency issues with empty vs null features. I've found some bugs which are hopefully easily fixed. See reprex.

# pak::pkg_install("paleolimbot/wk#205")

# debug shows both NaN and NA are null feature?
wk::wk_debug(wk::xy(c(NA, NaN), c(NA, NaN)))
#> initialize (dirty = 0  -> 1)
#> vector_start: POINT[2] <0xffffe4850230> => WK_CONTINUE
#>   feature_start (1): <0xffffe4850230>  => WK_CONTINUE
#>     null_feature  => WK_CONTINUE
#>   feature_end (1): <0xffffe4850230>  => WK_CONTINUE
#>   feature_start (2): <0xffffe4850230>  => WK_CONTINUE
#>     null_feature  => WK_CONTINUE
#>   feature_end (2): <0xffffe4850230>  => WK_CONTINUE
#> vector_end: <0xffffe4850230>
#> deinitialize
#> NULL

# xy meta
testthat::expect_identical(
  wk::xy(NA, NA) |>
    wk::wk_meta(),
  wk::wkt(NA_character_) |>
    wk::wk_meta()
)

# xy meta
testthat::expect_identical(
  wk::xy(NaN, NaN) |>
    wk::wk_meta(),
  wk::wkt("POINT EMPTY") |>
    wk::wk_meta()
)
#> Error: wk::wk_meta(wk::xy(NaN, NaN)) (`actual`) not identical to wk::wk_meta(wk::wkt("POINT EMPTY")) (`expected`).
#> 
#> actual vs expected
#>                 geometry_type size has_z has_m precision is_empty
#> - actual[1, ]              NA   NA    NA    NA        NA       NA
#> + expected[1, ]             1    0 FALSE FALSE         0     TRUE
#> 
#>   `actual$geometry_type`: NA
#> `expected$geometry_type`:  1
#> 
#>   `actual$size`: NA
#> `expected$size`:  0
#> 
#> `actual$has_z`:   <NA> 
#> `expected$has_z`: FALSE
#> 
#> `actual$has_m`:   <NA> 
#> `expected$has_m`: FALSE
#> 
#>   `actual$precision`: NA
#> `expected$precision`:  0
#> 
#> `actual$is_empty`:   <NA>
#> `expected$is_empty`: TRUE

# xy -> sfc
testthat::expect_identical(
  wk::xy(NaN, NaN) |>
    sf::st_as_sfc(),
  wk::wkt("POINT EMPTY") |>
    sf::st_as_sfc()
)
#> Error: sf::st_as_sfc(wk::xy(NaN, NaN)) (`actual`) not identical to sf::st_as_sfc(wk::wkt("POINT EMPTY")) (`expected`).
#> 
#> `class(actual)`:   "sfc_GEOMETRYCOLLECTION" "sfc"
#> `class(expected)`: "sfc_POINT"              "sfc"
#> 
#> `actual[[1]]` is an S3 object of class <XY/GEOMETRYCOLLECTION/sfg>, a list
#> `expected[[1]]` is an S3 object of class <XY/POINT/sfg>, a double vector

# xy -> sfc
# should xy <null feature> be known to be an empty point?
testthat::expect_identical(
  # should this be point empty?
  wk::xy(NA, NA) |>
    sf::st_as_sfc(),
  wk::wkt(NA_character_) |>
    sf::st_as_sfc()
)

# round trip
testthat::expect_identical(
  wk::xy(NA, NA) |>
    sf::st_as_sfc() |>
    wk::as_xy() |>
    wk::wk_set_crs(NULL),
  wk::xy(NA, NA)
)

# impossible -> cannot succeed
testthat::expect_identical(
  wk::xy(NaN, NaN) |>
    sf::st_as_sfc() |>
    wk::as_xy() |>
    wk::wk_set_crs(NULL),
  wk::xy(NaN, NaN)
)
#> Error: wk::wk_set_crs(wk::as_xy(sf::st_as_sfc(wk::xy(NaN, NaN))), NULL) (`actual`) not identical to wk::xy(NaN, NaN) (`expected`).
#> 
#>   `actual$x`:  NA
#> `expected$x`: NaN
#> 
#>   `actual$y`:  NA
#> `expected$y`: NaN

# sfc -> xy. should this work?
testthat::expect_identical(
  sf::st_sfc(sf::st_point(c(NaN, NaN))) |>
    wk::as_xy(),
  wk::xy(NaN, NaN)
)

# coords
testthat::expect_equal(
  wk::xy(NA, NA) |>
    wk::wk_coords(),
  wk::wkt(NA_character_) |>
    wk::wk_coords()
)

# coords
testthat::expect_equal(
  wk::xy(NaN, NaN) |>
    wk::wk_coords(),
  wk::wkt("POINT EMPTY") |>
    wk::wk_coords()
)

# count
testthat::expect_equal(
  wk::wk_count(wk::xy(NA, NA)),
  wk::wk_count(wk::wkt(NA_character_))
)

# count. xy(NaN, NaN) inconsistent with other handlers
testthat::expect_equal(
  wk::wk_count(wk::xy(NaN, NaN)),
  wk::wk_count(wk::wkt("POINT EMPTY"))
)
#> Error: wk::wk_count(wk::xy(NaN, NaN)) (`actual`) not equal to wk::wk_count(wk::wkt("POINT EMPTY")) (`expected`).
#> 
#> actual vs expected
#>                 n_geom
#> - actual[1, ]        0
#> + expected[1, ]      1
#> 
#>   `actual$n_geom`: 0
#> `expected$n_geom`: 1

Created on 2023-10-16 with reprex v2.0.2

anthonynorth commented 11 months ago

I think we should expect sf::st_point(c(NA, NA)) |> wk::as_xy() to be wk::xy(NaN, NaN).

# pak::pkg_install("paleolimbot/wk#205")

# sf POINT EMPTY
testthat::expect_identical(
  sf::st_sfc(sf::st_point(c(NA_real_, NA_real_))) |>
    wk::as_xy(),
  wk::xy(NaN, NaN)
)
#> Error: wk::as_xy(sf::st_sfc(sf::st_point(c(NA_real_, NA_real_)))) (`actual`) not identical to wk::xy(NaN, NaN) (`expected`).
#> 
#>   `actual$x`:  NA
#> `expected$x`: NaN
#> 
#>   `actual$y`:  NA
#> `expected$y`: NaN

# already works
testthat::expect_identical(
  sf::st_sfc(sf::st_point(c(NaN, NaN))) |>
    wk::as_xy(),
  wk::xy(NaN, NaN)
)

testthat::expect_identical(
  wk::wk_handle(
    sf::st_sfc(sf::st_point(c(NA_real_, NA_real_))),
    wk::xy_writer()
  ),
  wk::xy(NaN, NaN)
)
#> Error: wk::wk_handle(...) (`actual`) not identical to wk::xy(NaN, NaN) (`expected`).
#> 
#>   `actual$x`:  NA
#> `expected$x`: NaN
#> 
#>   `actual$y`:  NA
#> `expected$y`: NaN

testthat::expect_identical(
  wk::wk_handle(
    sf::st_sfc(sf::st_point(c(NaN, NaN))),
    wk::xy_writer()
  ),
  wk::xy(NaN, NaN)
)
#> Error: wk::wk_handle(sf::st_sfc(sf::st_point(c(NaN, NaN))), wk::xy_writer()) (`actual`) not identical to wk::xy(NaN, NaN) (`expected`).
#> 
#>   `actual$x`:  NA
#> `expected$x`: NaN
#> 
#>   `actual$y`:  NA
#> `expected$y`: NaN

Created on 2023-10-16 with reprex v2.0.2

paleolimbot commented 11 months ago

Thank you for taking a look!

I think the main thing here is that xy(NaN, NaN) could be treated as EMPTY, where xy(NA, NA) could be treated as null? I'll look into the level of complexity there...it's a good point that we have that option in R.

Do we need an is.nan() method for xy vectors?

I don't think so...while a point can be missing, I'm not sure that it can be NaN in the same way that a number can be NaN.

sf::st_sfc(sf::st_point(c(NaN, NaN))) |> wk::as_xy()

I may fix that one separately...I think that happens because when all the geometries are empty, the sfc writer doesn't know what geometry to pick. In this case, we also have geometry type at the vector level and I think I can rig something to use that as a fallback.

paleolimbot commented 11 months ago

Ok, that turned into a bit of a rabbit hole, but I think it's much better. One consequence of this was that I noticed wk_coords() when there was a null POINT actually omitted a vertex (!!!!). I changed this to omit null vertices altogether...I don't remember if I had a good reason for including a null feature as a single NA coordinate.

anthonynorth commented 11 months ago

Looking good.

I've noticed that sf::st_as_sfc(wk::xy(NaN, NaN)) fails. This is because the default behaviour of sf:::st_as_sf.data.frame() fails if na values are present. This isn't consistent with other formats, so perhaps it's worth patching?

# pak::pkg_install("paleolimbot/wk#205")
testthat::local_edition(3)
testthat::expect_identical(
  wk::wk_handle(
    wk::xy(NaN, NaN),
    wk::sfc_writer()
  ),
  sf::st_sfc(sf::st_point()),
  ignore_attr = c("crs", "bbox")
)

# should we default na.fail = FALSE for consistency?
testthat::expect_identical(
  wk::xy(NaN, NaN) |>
    sf::st_as_sfc(),
  sf::st_sfc(sf::st_point()),
  ignore_attr = c("crs", "bbox")
)
#> Error in st_as_sf.data.frame(as.data.frame(x), coords = xy_dims(x), crs = sf_crs_from_wk(x)): missing values in coordinates not allowed

testthat::expect_identical(
  {
    xys <- wk::xy(NaN, NaN)
    coords <- as.data.frame(xys)
    # also need to convert NaN to NA?
    coords <- as.data.frame(lapply(coords, \(x) replace(x, is.nan(x), NA_real_)))
    sf::st_as_sf(coords, coords = wk::xy_dims(xys), crs = sf::st_crs(wk::wk_crs(xys)), na.fail = FALSE) |>
      sf::st_geometry()
  },
  sf::st_sfc(sf::st_point()),
  # these appear to be inconsistencies in sf itself?
  ignore_attr = c("bbox", "n_empty")
)
#> Warning in min(cc[[1]], na.rm = TRUE): no non-missing arguments to min;
#> returning Inf
#> Warning in min(cc[[2]], na.rm = TRUE): no non-missing arguments to min;
#> returning Inf
#> Warning in max(cc[[1]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning in max(cc[[2]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf

testthat::expect_identical(
  wk::xy(NaN, NaN) |>
    wk::as_wkb() |>
    sf::st_as_sfc(),
  sf::st_sfc(sf::st_point()),
  ignore_attr = c("crs", "bbox")
)

Created on 2023-10-18 with reprex v2.0.2

Edit: fix reprex

anthonynorth commented 11 months ago

I've found an inconsistency between wk_xy.sfc and using a handler.

# pak::pkg_install("paleolimbot/wk#205")
testthat::expect_identical(
  wk::as_xy(sf::st_sfc(sf::st_point())),
  wk::xy(NaN, NaN)
)
#> Error: wk::as_xy(sf::st_sfc(sf::st_point())) not identical to wk::xy(NaN, NaN).
#> Objects equal but not identical

testthat::expect_identical(
  wk::wk_handle(
    sf::st_sfc(sf::st_point()),
    wk::xy_writer()
  ),
  wk::xy(NaN, NaN)
)

Created on 2023-10-18 with reprex v2.0.2

paleolimbot commented 11 months ago

Thanks again! I still have to add a few tests and fix the last issue you identified. I don't think I ever would have spotted those!