hypertidy / ncmeta

Tidy NetCDF metadata
https://hypertidy.github.io/ncmeta/
11 stars 5 forks source link

attribute value types #2

Open mdsumner opened 7 years ago

mdsumner commented 7 years ago

nc_atts returns a list col so that all attributes can be stored in value. We'll need to have helpers to

Probably we can just spread the list col by name

mdsumner commented 6 years ago

We aren't getting names atm, though it's a "simple list"


 f <- "S20092742009304.L3m_MO_CHL_chlor_a_9km.nc"
 l3file <- system.file("extdata/oceandata", f, package= "tidync")
x <-  ncmeta::nc_atts(l3file, "chlor_a")
names(x$value) ## nothing

These would be handy to get "valid range" from.

gsapijaszko commented 1 year ago

You mean something like:?

f <- system.file("extdata", "S2008001.L3m_DAY_CHL_chlor_a_9km.nc", package = "ncmeta")

aa <- function(b) {
  b <- unlist(unlist(b, FALSE))
  b <- paste(toString(b))
  return(b)
}

nc_atts(f) |>
  dplyr::rowwise() |>
  dplyr::mutate(vv = aa(value)) |>
  subset(select = c("name", "variable", "vv")) |>
  dplyr::group_by(variable) |>
  tidyr::pivot_wider(names_from = "name", values_from = "vv") |>
  dplyr::ungroup() |>
  subset(select = c(1:3, 6:7))

#> # A tibble: 4 × 5
#>   variable  long_name                                units       valid…¹ valid…²
#>   <chr>     <chr>                                    <chr>       <chr>   <chr>  
#> 1 chlor_a   Chlorophyll Concentration, OCI Algorithm mg m^-3     0.0010… 100    
#> 2 lat       Latitude                                 degree_nor… -90     90     
#> 3 lon       Longitude                                degree_east -180    180    
#> 4 NC_GLOBAL <NA>                                     <NA>        <NA>    <NA>   
#> # … with abbreviated variable names ¹​valid_min, ²​valid_max

Might be wort to add a parameter to nc_atts() like values = FALSE|TRUE

Regards, Greg

mdsumner commented 1 year ago

yes that looks good,I'm not really sure how to approach this stuff - but when it's just 1:1 rows perhaps we just add the columns in as you have? and leave $value as a list for stuff that requires more complex nesting? when RNetCDF started returning chunk size and so on a PR removed information to keep the rows tidy, but ultimately we'd like to keep those.

gsapijaszko commented 1 year ago

OK, came to conclusion, something like:

library(ncmeta)
x <- system.file("extdata", "S2008001.L3m_DAY_CHL_chlor_a_9km.nc", package = "ncmeta")
nc_atts(x)
#> # A tibble: 87 × 4
#>       id name          variable value       
#>    <int> <chr>         <chr>    <named list>
#>  1     0 long_name     chlor_a  <chr [1]>   
#>  2     1 units         chlor_a  <chr [1]>   
#>  3     2 standard_name chlor_a  <chr [1]>   
#>  4     3 _FillValue    chlor_a  <dbl [1]>   
#>  5     4 valid_min     chlor_a  <dbl [1]>   
#>  6     5 valid_max     chlor_a  <dbl [1]>   
#>  7     6 display_scale chlor_a  <chr [1]>   
#>  8     7 display_min   chlor_a  <dbl [1]>   
#>  9     8 display_max   chlor_a  <dbl [1]>   
#> 10     9 scale_factor  chlor_a  <dbl [1]>   
#> # … with 77 more rows

added vector for variables instead of variable[1]

nc_atts(x, variable = c("lat", "chlor_a"))
#> # A tibble: 17 × 4
#>       id name          variable value       
#>    <int> <chr>         <chr>    <named list>
#>  1     0 long_name     chlor_a  <chr [1]>   
#>  2     1 units         chlor_a  <chr [1]>   
#>  3     2 standard_name chlor_a  <chr [1]>   
#>  4     3 _FillValue    chlor_a  <dbl [1]>   
#>  5     4 valid_min     chlor_a  <dbl [1]>   
#>  6     5 valid_max     chlor_a  <dbl [1]>   
#>  7     6 display_scale chlor_a  <chr [1]>   
#>  8     7 display_min   chlor_a  <dbl [1]>   
#>  9     8 display_max   chlor_a  <dbl [1]>   
#> 10     9 scale_factor  chlor_a  <dbl [1]>   
#> 11    10 add_offset    chlor_a  <dbl [1]>   
#> 12    11 reference     chlor_a  <chr [1]>   
#> 13     0 long_name     lat      <chr [1]>   
#> 14     1 units         lat      <chr [1]>   
#> 15     2 _FillValue    lat      <dbl [1]>   
#> 16     3 valid_min     lat      <dbl [1]>   
#> 17     4 valid_max     lat      <dbl [1]>

and with added values, default FALSE

nc_atts(x, variable = c("lat", "chlor_a"), values = TRUE)
#>   variable                                long_name        units
#> 1      lat                                 Latitude degree_north
#> 2  chlor_a Chlorophyll Concentration, OCI Algorithm      mg m^-3
#>                                               standard_name _FillValue
#> 1                                                      <NA>     -32767
#> 2 mass_concentration_chlorophyll_concentration_in_sea_water     -32767
#>   valid_min valid_max display_scale display_min display_max scale_factor
#> 1    -9e+01        90          <NA>          NA          NA           NA
#> 2     1e-03       100           log        0.01          20            1
#>   add_offset
#> 1         NA
#> 2          0
#>                                                                                                                                                                                                         reference
#> 1                                                                                                                                                                                                            <NA>
#> 2 Hu, C., Lee Z., and Franz, B.A. (2012). Chlorophyll-a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference, J. Geophys. Res., 117, C01011, doi:10.1029/2011JC007395.

And with other file, where I length of the list != 1

x <- "../../ropensci_tidync/gs_test/ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2/sst.wkmean.1990-present.nc"
nc_atts(x)
#> # A tibble: 48 × 4
#>       id name               variable value       
#>    <int> <chr>              <chr>    <named list>
#>  1     0 units              lat      <chr [1]>   
#>  2     1 long_name          lat      <chr [1]>   
#>  3     2 actual_range       lat      <dbl [2]>   
#>  4     3 standard_name      lat      <chr [1]>   
#>  5     4 axis               lat      <chr [1]>   
#>  6     5 coordinate_defines lat      <chr [1]>   
#>  7     0 units              lon      <chr [1]>   
#>  8     1 long_name          lon      <chr [1]>   
#>  9     2 actual_range       lon      <dbl [2]>   
#> 10     3 standard_name      lon      <chr [1]>   
#> # … with 38 more rows
nc_atts(x, variable = c("lat", "lon"), values = TRUE)
#>   variable         units long_name actual_range standard_name axis
#> 1      lat degrees_north  Latitude  89.5, -89.5      latitude    Y
#> 2      lon  degrees_east Longitude   0.5, 359.5     longitude    X
#>   coordinate_defines
#> 1             center
#> 2             center

Almost ready to merge :)

Regards, Greg

gsapijaszko commented 1 year ago

And simillary for nc_att(x):

x <- "/home/sapi/projekty/ropensci_tidync/gs_test/ftp.cdc.noaa.gov/Datasets/noaa.oisst.v2/sst.wkmean.1990-present.nc"
nc_att(x, "lat", 2, values = TRUE)
#> # A tibble: 1 × 4
#>      id name         variable value      
#>   <int> <chr>        <chr>    <chr>      
#> 1     2 actual_range lat      89.5, -89.5

However there TRUE as default probably would be better.

G.

mdsumner commented 1 year ago

excellent! I has dim concerns that it wouldn't unpack neatly (always 1:1), but maybe they were unfounded??

gsapijaszko commented 1 year ago

We are saying: It'll all come out in the wash. Don't have much files for testing right now. On the other hand, if an issue arises, we will address it. PR on way.

G.

mdsumner commented 1 year ago

cool ta I'll run it across some files, we have many of them :)

mdsumner commented 1 year ago

oh, I see what's happening now - widening the output for a given set of vars

I need to think about this one, not sure nc_atts should do this, but perhaps as utility function to expand attributes?

still, no harm in having this act as you've PRed it - I'll look again later on a boader set of files