ropensci / tidync

NetCDF exploration and data extraction
https://docs.ropensci.org/tidync
90 stars 12 forks source link

Unintended change in types of axes in hyper_tibble output after ver. 0.4.0 update? #128

Open KeachMurakami opened 1 month ago

KeachMurakami commented 1 month ago

With the update to version 0.4.0, the return type of the axes in hyper_tibble has been changed. Specifically, in the case of sample meteorological data (https://github.com/ropensci/tidync/issues/114), lat, lon, and time have been converted from numerics to characters. Is this an intentional change?

The output of hyper_transforms remains numeric both before and after the update, and it works with numeric types during filtering as well. In version 0.4.0, it now seems necessary to manually convert the types using something like tidync("gistemp250_GHCNv4.nc") |> hyper_tibble() |> dplyr::mutate(lat = as.numeric(lat), ...), which adds a bit of extra effort.

Many thanks!

# get sample meteorological data; https://github.com/ropensci/tidync/issues/114
url <- "https://data.giss.nasa.gov/pub/gistemp/gistemp250_GHCNv4.nc.gz"
curl::curl_download(url, basename(url))
system(sprintf("gunzip %s", basename(url)))
library(tidync)
tidync("gistemp250_GHCNv4.nc") |> hyper_tibble()

# ver. 0.4.0
# 
# # A tibble: 9,034,094 × 4
# tempanomaly lon   lat   time      
# <dbl> <chr> <chr> <chr>     
# 1       0.180 -179  -47   1880-01-15
# 2       0.180 -177  -47   1880-01-15
# 3       0.180 -175  -47   1880-01-15
# ...

# ver. 0.3.0
# 
# # A tibble: 9,034,094 × 4
# tempanomaly   lon   lat  time
# <dbl> <dbl> <dbl> <dbl>
# 1       0.180  -179   -47 29233
# 2       0.180  -177   -47 29233
# 3       0.180  -175   -47 29233
# ...

tidync("gistemp250_GHCNv4.nc") |> hyper_transforms()

# ver. 0.3.0 and ver. 0.4.0
# 
# $lon
# # A tibble: 180 × 6
# lon index    id name  coord_dim selected
# <dbl> <int> <int> <chr> <lgl>     <lgl>   
# 1  -179     1     1 lon   TRUE      TRUE    
# 2  -177     2     1 lon   TRUE      TRUE    
# 3  -175     3     1 lon   TRUE      TRUE    
# 4  -173     4     1 lon   TRUE      TRUE
# ...
#
# $lat 
# ...
#
# $time
# ...

# filtering by numeric values works with ver. 0.4.0
tidync("gistemp250_GHCNv4.nc") |>
  hyper_tibble(lat = lat > 80, lon = lon > 100)

# # A tibble: 1,560 × 4
# tempanomaly lon   lat   time      
# <dbl> <chr> <chr> <chr>     
# 1        7.07 101   81    1953-11-15
# 2        7.07 103   81    1953-11-15
# 3        7.07 105   81    1953-11-15
# ...
Session Info ```r ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.3.1 (2023-06-16) os macOS Sonoma 14.6.1 system aarch64, darwin20 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz Asia/Tokyo date 2024-10-09 rstudio 2023.06.2+561 Mountain Hydrangea (desktop) pandoc NA ─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.3) cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.3.0) cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.3) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0) digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3) dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.1) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3) fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.1) funkea 0.0.2.0001 2024-09-09 [1] Github (KeachMurakami/funkea@175917b) generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) glue 1.8.0 2024-09-30 [1] CRAN (R 4.3.3) htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1) htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1) httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.3.1) later 1.3.2 2023-12-06 [1] CRAN (R 4.3.1) lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.1) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0) mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.3.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.3.3) plantecophys 1.4-6 2021-03-31 [1] CRAN (R 4.3.0) profvis 0.4.0 2024-09-20 [1] CRAN (R 4.3.3) promises 1.3.0 2024-04-05 [1] CRAN (R 4.3.1) purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.3.3) RcppRoll 0.3.1 2024-07-07 [1] CRAN (R 4.3.3) readxl 1.4.3 2023-07-06 [1] CRAN (R 4.3.0) remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.1) rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.3) rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.1) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) shiny 1.9.1 2024-08-01 [1] CRAN (R 4.3.3) stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.1) stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.1) tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.1) tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1) timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.1) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0) usethis 3.0.0 2024-07-29 [1] CRAN (R 4.3.3) utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) winter 0.0.0.9002 2023-11-27 [1] Github (KeachMurakami/winter@6b135f6) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0) ```
mdsumner commented 1 month ago

Definitely not intended, I'll explore ty

mdsumner commented 1 month ago

Ok this was introduced as part of the CF timestamp change, @pvanlaake - I haven't isolated it yet but will try to do so in coming days. I should have had a test for that, whoops

pvanlaake commented 1 month ago

This is caused by an oversight: in the code the tibble was constructed from the dimnames() of the tidync object and these are indeed of character type. The code has been updated and a PR is waiting to be merged into the main branch.

Note that for the "time" axis, the timestamps in the tibble are, and should be, of character type. Under the CF Metadata Conventions there are 9 different calendars and only 3 are compatible with POSIXt. Character strings can accommodate all of them. In this particular data set the "calendar" attribute is not given, meaning that it is assumed to be a "standard", POSIXt-compatible calendar, but for consistency all timestamps are given as a character string. You can convert to Date by adding a date column, using the as.Date() function.

mdsumner commented 1 month ago

Thanks @pvanlaake ! I agree about the timestamps