Open lukasjonkers opened 2 years ago
Thanks for the suggestion @lukasjonkers
Looks like the changes are failing the testthat tests. I'll look in more detail later, but perhaps it would be better to keep the function returning what it currently does but provide an as_tibble()
method for objects returned by read_meta()
(after imposing a class on those objects in read_meta()
)?
That way your workflow would be
read_meta() |>
as_tibble()
leaving the original function most untouched?
Hi @gavinsimpson I saw too that the tests are failing. Perhaps this is because the structure of the output changes with my suggestion?
Your suggestion to simply convert the output of the original function to a tibble
does not address the issue that the metadata text is inadequately parsed in the first place. The only thing the function does now is split at , but 1) that means that there are unnecessary splits because also occurs in units (see example above and my suggestion to catch this problem) and 2) that the name, short name and unit are not parsed at all. I would find it useful in order to better filter the data.
@lukasjonkers Sorry that I wasn't clear. Let's assume we fix the meta data parsing problems (which seems like something we certainly should do), would an as_tibble()
method work in that case such that we don't change the structure of the returned object and potentially break existing code?
Hi @gavinsimpson I understand your concern not to break existing code, although I am not sure how many people have written code to parse the metadata list that is currently returned.
Nevertheless, in that case it would make sense to return a more structured list (with the same elements as columns in the tibble?), rather than a tibble
as in my suggestion. I'm not sure how as_tibble
would solve this, but maybe I'm overlooking things.
The testthat tests fail exactly because I suggested changing the output from a list to a tibble, but I guess you saw that already. Below is copied from test-pg_data.R
# parameters has extra parsing
## its a list
expect_is(aa[[1]]$metadata$parameters, "list")
## not named
expect_named(aa[[1]]$metadata$parameters, NULL)
## length of the list of parameters should equal columns of data
expect_equal(length(aa[[1]]$metadata$parameters), NCOL(aa[[1]]$data))
Description
Amended the read_meta function in an attempt to better parse the parameter description. Rather than a list of lists the parameters element in metadata now contains a more structured tibble with name, short name, unit, PI, method and comment like on the pangaea website. This should allow searching the metadata and facilitate using the metadata in a hopefully more meaningful way.
Related Issue
None. Just a personal wish that I thought others might find useful too.
Example
Based on https://doi.pangaea.de/10.1594/PANGAEA.115823 Old
New
Test
I ran this with for 10s of DOIs and think it works. No guarantee though, I'm no regexp expert.