Closed gkamener closed 12 months ago
Thanks for reporting this @gkamener. I'll have a look and get back to you here.
Thanks for your patience @gkamener.
EDI is considering a change to the pubdate
field of the Search Data Packages API method, but before doing so, and possibly breaking anyones code currently using this result as it currently stands, we'd like to hear more about your particular use case.
Please note, one immediate fix to this issue is to call the Read Metadata Resource Metadata method with the full data package ID (e.g. knb-lter-fce.1076.4) to get the dateCreated
field, which contains the value of pubdate
but in the YYYY-MM-DD
format you are looking for.
Thank you for reviewing this @clnsmth.
My use case is to utilize metadata from each FCE package in EDI's repository as a validation check against portions of metadata we have for those packages in the FCE database. We use the latter to track the current status and other details for each package, and the metadata returned from search_data_packages
has already helped me correct some erroneous enddate
values plus other metadata in our database.
Being able to retrieve the most recent pubdate
values in the YYYY-MM-DD
format for all FCE packages through search_data_packages
would be helpful, but I don't think making such changes just for my use case would be worth breaking anyone's code.
Thank you for suggesting read_metadata_resource_metadata
, I may look into that as a check to ensure that pubdates
from EDI align with what we have in the FCE database.
Thanks for this helpful context @gkamener. We'll take this into consideration.
Another API method that may help with your metadata validation use case, is Read Metadata. This returns, the full EML metadata record, in XML, and is the source of information that is indexed and returned through the Search Data Packages method. So, if you are looking for the information via Search Data Packages, you will also find it in the source metadata. Note, the indexed metadata is a considerably smaller subset of the source metadata record.
Now, you may be scratching your head asking "Why would I want to access the publication date through the Read Metadata method just to get the same value I get through Search Data Packages?", well, there is actually a transformation that occurs in the Search Data Packages pathway that you can bypass by reading the EML metadata and parsing the XML to get the <pubDate>
element value directly. For example:
> library(EDIutils)
> library(xml2)
>
> # Read the metadata of a data package and get the publication date
> eml <- read_metadata("knb-lter-fce.1076.4")
> pubdate <- xml_find_all(eml, xpath = ".//dataset/pubDate")
> xml_text(pubdate)
[1] "2019-03-05"
>
> # While we're at it, get the begin and end dates as well
> begindate <- xml_find_all(eml, xpath = ".//dataset/coverage//.//beginDate/calendarDate")
> xml_text(begindate)
[1] "1998-08-19"
>
> enddate <- xml_find_all(eml, xpath = ".//dataset/coverage//.//endDate/calendarDate")
> xml_text(enddate)
[1] "2006-12-03"
>
Thanks for the suggestion @clnsmth! It's very helpful!
Hi @gkamener. Is there anything else I can lend a hand with before closing this issue?
Hi @clnsmth. I think I'm good. Thanks for the help!
I'm experiencing a possible bug when attempting to query FCE package metadata when including the pubdate.
Using the search_data_packages() function to query pubdate only returns the year for that value instead of a date (expecting something including YYYY-MM-DD). In comparison, including begindate or enddate in the same query returns YYYY-MM-DD for those values.
I am using version 1.0.2 of the package with R version 4.2.2.
An example of the script I'm running to query and screenshot from the result is provided below.