Open PietrH opened 5 months ago
@peterdesmet, I have a few questions:
datapackage.json
, is currently up to date. Is etn_fields.csv
? AKA, did #226 ever get done?etn_fields.csv
contains a human readable definition, an example and an order per field, do you have an example of how to encode this in a datapackage? Did a little digging and found quite a few fields in datapackage.json
that aren't in etn_fields.csv
, as is expected since I remember keeping datapackage.json
up to date but not etn_fields.csv
. To replicate:
# is the field definitions csv up to date with the datapackage.json?
datapackage_json <-
jsonlite::read_json("https://raw.githubusercontent.com/inbo/etn/main/inst/assets/datapackage.json")
field_definitions <- readr::read_csv("https://raw.githubusercontent.com/inbo/etn/main/inst/assets/etn_fields.csv")
#> Rows: 182 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): view, field, definition, example
#> dbl (1): order
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# parse datapackage_json to a table ---------------------------------------
datapackage_tbl <-
datapackage_json |>
purrr::chuck("resources") |>
purrr::map(
\(resource) purrr::set_names(
purrr::chuck(resource, "schema", "fields"),
purrr::chuck(resource, "name")
)
) |>
purrr::map(
\(resource) purrr::map(
resource,
~ dplyr::tibble(
name = purrr::pluck(.x, "name"),
type = purrr::pluck(.x, "type"),
resource_name = unique(names(resource))
)
)
) |>
purrr::map(purrr::list_rbind) |>
purrr::list_rbind()
# modify field_definitons -------------------------------------------------
field_definitions_rn <-
field_definitions |>
dplyr::mutate(
resource_name =
stringr::str_extract(view, "^[a-z]+(?=_)")
)
# See if any are missing --------------------------------------------------
# fields that are in datapackage.json but not in etn_fields.csv:
dplyr::anti_join(
datapackage_tbl,
field_definitions_rn,
by = dplyr::join_by(resource_name == resource_name,
name == field)
) |>
print(n = Inf)
#> # A tibble: 45 × 3
#> name type resource_name
#> <chr> <chr> <chr>
#> 1 animal_id integer animals
#> 2 animal_project_code string animals
#> 3 tag_serial_number integer animals
#> 4 tag_type string animals
#> 5 tag_subtype string animals
#> 6 acoustic_tag_id string animals
#> 7 acoustic_tag_id_alternative string animals
#> 8 tag_serial_number integer tags
#> 9 tag_type string tags
#> 10 tag_subtype string tags
#> 11 acoustic_tag_id string tags
#> 12 acoustic_tag_id_alternative string tags
#> 13 manufacturer string tags
#> 14 model string tags
#> 15 activation_date datetime tags
#> 16 length number tags
#> 17 diameter number tags
#> 18 weight number tags
#> 19 floating boolean tags
#> 20 archive_memory string tags
#> 21 sensor_range_min integer tags
#> 22 sensor_range_max integer tags
#> 23 sensor_resolution number tags
#> 24 sensor_unit string tags
#> 25 sensor_accuracy number tags
#> 26 owner_organization string tags
#> 27 tag_id string tags
#> 28 tag_device_id integer tags
#> 29 detection_id integer detections
#> 30 tag_serial_number integer detections
#> 31 acoustic_tag_id string detections
#> 32 acoustic_project_code string detections
#> 33 depth_in_meters number detections
#> 34 sensor2_value number detections
#> 35 sensor2_unit number detections
#> 36 deployment_id integer detections
#> 37 deployment_id integer deployments
#> 38 acoustic_project_code string deployments
#> 39 activation_date_time datetime deployments
#> 40 valid_data_until_date_time datetime deployments
#> 41 receiver_model string receivers
#> 42 receiver_serial_number string receivers
#> 43 owner_organization string receivers
#> 44 built_in_acoustic_tag_id string receivers
#> 45 ar_model string receivers
Created on 2024-02-12 with reprex v2.0.2
As far as I know
datapackage.json
, is currently up to date. Isetn_fields.csv
? AKA, did #226 ever get done? What is the status of https://inbo.github.io/etn/articles/etn_fields.html ?
No, #226 never got done. Here's how the HTML page and CSV file originally worked:
vliz.datapaper_metadata_fields
, with a field indicating in what view they are used.?
button) were derived from that table or stored elsewhere.etn_fields.Rmd
was created.With database restructuring, the views disappeared and the table was renamed to app.field_metadata
. The etn_fields.Rmd
could not be reinstated to its functioning form.
What I suggest: let's maintain all information in a datapackage.json
file:
etn_fields.Rmd
The information for a field would look like this:
{
"name": "capture_temperature_change",
"description": "Difference between water temperature of the system where the fish was caught and the water temperature of the holding reservoir.",
"type": "string",
"unit": "degrees celsius",
"example": "5ºC"
}
Jesus pointed this out to me via email:
I wonder how this relates to: https://github.com/inbo/etn/milestone/4
Specifically #226
Questions
inst/assets/etn_fields.csv
still correct?inst/assets/etn_fields.csv
somehow automatically updated? Or kept in sync withdatapackage.json
?I remember updating
datapackage.json
a good while ago, and I certainly know aboutetn_fields.csv
...