inbo / etn

R package to access data from the European Tracking Network
https://inbo.github.io/etn/
MIT License
5 stars 4 forks source link

update `datapackage.json` so it includes field names from `etn_fields.csv` #290

Open PietrH opened 5 months ago

PietrH commented 5 months ago

Jesus pointed this out to me via email:


Hi Pieter,

I need to update this datapackage.json so it includes the field definitions found[ here](https://github.com/inbo/etn/blob/main/inst/assets/etn_fields.csv). Would you like to apply these changes yourself, or otherwise, would you be ok with me applying the changes onto this repository?

Best,
Jesus.

I wonder how this relates to: https://github.com/inbo/etn/milestone/4

Specifically #226

Questions

I remember updating datapackage.json a good while ago, and I certainly know about etn_fields.csv...

PietrH commented 5 months ago

@peterdesmet, I have a few questions:

PietrH commented 5 months ago

Did a little digging and found quite a few fields in datapackage.json that aren't in etn_fields.csv, as is expected since I remember keeping datapackage.json up to date but not etn_fields.csv. To replicate:

# is the field definitions csv up to date with the datapackage.json? 

datapackage_json <-
  jsonlite::read_json("https://raw.githubusercontent.com/inbo/etn/main/inst/assets/datapackage.json")

field_definitions <- readr::read_csv("https://raw.githubusercontent.com/inbo/etn/main/inst/assets/etn_fields.csv")
#> Rows: 182 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): view, field, definition, example
#> dbl (1): order
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# parse datapackage_json to a table ---------------------------------------

datapackage_tbl <-
  datapackage_json |>
  purrr::chuck("resources") |>
  purrr::map(
    \(resource) purrr::set_names(
      purrr::chuck(resource, "schema", "fields"),
      purrr::chuck(resource, "name")
    )
  ) |>
  purrr::map(
    \(resource) purrr::map(
      resource,
      ~ dplyr::tibble(
        name = purrr::pluck(.x, "name"),
        type = purrr::pluck(.x, "type"),
        resource_name = unique(names(resource))
      )
    )
  ) |>
  purrr::map(purrr::list_rbind) |>
  purrr::list_rbind()

# modify field_definitons -------------------------------------------------

field_definitions_rn <- 
  field_definitions |> 
  dplyr::mutate(
    resource_name = 
      stringr::str_extract(view, "^[a-z]+(?=_)")
  )

# See if any are missing --------------------------------------------------

# fields that are in datapackage.json but not in etn_fields.csv:
dplyr::anti_join(
  datapackage_tbl,
  field_definitions_rn,
  by = dplyr::join_by(resource_name == resource_name,
                      name == field)
) |>
  print(n = Inf)
#> # A tibble: 45 × 3
#>    name                        type     resource_name
#>    <chr>                       <chr>    <chr>        
#>  1 animal_id                   integer  animals      
#>  2 animal_project_code         string   animals      
#>  3 tag_serial_number           integer  animals      
#>  4 tag_type                    string   animals      
#>  5 tag_subtype                 string   animals      
#>  6 acoustic_tag_id             string   animals      
#>  7 acoustic_tag_id_alternative string   animals      
#>  8 tag_serial_number           integer  tags         
#>  9 tag_type                    string   tags         
#> 10 tag_subtype                 string   tags         
#> 11 acoustic_tag_id             string   tags         
#> 12 acoustic_tag_id_alternative string   tags         
#> 13 manufacturer                string   tags         
#> 14 model                       string   tags         
#> 15 activation_date             datetime tags         
#> 16 length                      number   tags         
#> 17 diameter                    number   tags         
#> 18 weight                      number   tags         
#> 19 floating                    boolean  tags         
#> 20 archive_memory              string   tags         
#> 21 sensor_range_min            integer  tags         
#> 22 sensor_range_max            integer  tags         
#> 23 sensor_resolution           number   tags         
#> 24 sensor_unit                 string   tags         
#> 25 sensor_accuracy             number   tags         
#> 26 owner_organization          string   tags         
#> 27 tag_id                      string   tags         
#> 28 tag_device_id               integer  tags         
#> 29 detection_id                integer  detections   
#> 30 tag_serial_number           integer  detections   
#> 31 acoustic_tag_id             string   detections   
#> 32 acoustic_project_code       string   detections   
#> 33 depth_in_meters             number   detections   
#> 34 sensor2_value               number   detections   
#> 35 sensor2_unit                number   detections   
#> 36 deployment_id               integer  detections   
#> 37 deployment_id               integer  deployments  
#> 38 acoustic_project_code       string   deployments  
#> 39 activation_date_time        datetime deployments  
#> 40 valid_data_until_date_time  datetime deployments  
#> 41 receiver_model              string   receivers    
#> 42 receiver_serial_number      string   receivers    
#> 43 owner_organization          string   receivers    
#> 44 built_in_acoustic_tag_id    string   receivers    
#> 45 ar_model                    string   receivers

Created on 2024-02-12 with reprex v2.0.2

peterdesmet commented 5 months ago

As far as I know datapackage.json, is currently up to date. Is etn_fields.csv? AKA, did #226 ever get done? What is the status of https://inbo.github.io/etn/articles/etn_fields.html ?

No, #226 never got done. Here's how the HTML page and CSV file originally worked:

With database restructuring, the views disappeared and the table was renamed to app.field_metadata. The etn_fields.Rmd could not be reinstated to its functioning form.

What I suggest: let's maintain all information in a datapackage.json file:

The information for a field would look like this:

{
  "name": "capture_temperature_change",
  "description": "Difference between water temperature of the system where the fish was caught and the water temperature of the holding reservoir.",
  "type": "string",
  "unit": "degrees celsius",
  "example": "5ºC"
 }