Closed peterdesmet closed 2 years ago
Feedback by @sarahcd:
First thoughts: Could/should we include the URIs with the attribute info? Is it appropriate to add package-level info like license and citations, and if so how? Could we write script to create the files with R (easiest option for me unless that is a terrible idea for some reason)? Can this be written in a way that facilitates ingestion to GBIF? Feel free to start a new thread or move to github....
My answers to @sarahcd:
"rdfType"
. I will add those.1. provide file names of csv files + their headers
2. get information about those headers from the movebank attribute dictionary (is it available as xml/json?)
3. write that information as a datapackage.json
datapackage.json
would be used to understand the structure so it can be converted to Darwin Core.@sarahcd why do tag id
and tag local indentifier
(sic) both exist as concept in the Movebank Attribute dictionary? They are the same concepts and have the same definition. It is unclear which one I should refer to (I think tag id
). Same for animal id
vs animal local identifier
.
The two versions of tag/animal identifier labels are something created a long time ago, and I'm not sure what the rationale was. tag-id/animal-id are the names used in reference data downloads, and tag-local-identifier/animal-local-identifier are he names used in event data downloads. As you see in the vocabulary I use the alternative label to show they are the same thing. I could delete one set of the entries so that there is just tag-local-identifier/animal-local-identifier with the alt labels. When I make readme files I use tag/animal id, which is an arbitrary decision; fyi "local-identifier" indicates it is the user-defined ID rather than a numeric identifier assigned by the database.
Extra metadata in readmes: I see your point and certainly have cases where I update the DataCite metadata and of course cannot update the readme.txt. I hesitate to eliminate the text readme entirely from the repository, because they are human readable and easy to store locally, which reduces provenance loss especially if files get passed around after download and exists without the internet. Does Zenodo offer a human readable download option for the update-able metadata?
Re
2. get information about those headers from the movebank attribute dictionary (is it available as xml/json?)
Can the machine readable terms be harvested from the NERC vocabulary service?
@peggynewman this is something @sarahcd will inquire NERC about, as well as adding data type and format for terms. That way, a datapackage.json could be build from the NERC vocab.
Note: If you go to a NERC vocab, e.g. vocab.nerc.ac.uk/collection/MVB, see Alternate Profiles in the upper right for several JSON formats. I haven't explored them yet but they look potentially useful for harvesting.
A datapackage.json
file as described above can now be generated automatically with the movepub R package (see tutorial).
This is done by making use of the general purpose frictionless R package and looking up the definition and URL for every field in the Movebank Attribute Dictionary (using the get_mvb_term()
function). The resulting data in datapackage.json
looks as follows:
{
"name": "tag-id",
"title": "tag ID",
"description": "A unique identifier for the tag, provided by the data owner. If the data owner does not provide a tag ID, an internal Movebank tag identifier may sometimes be shown. Example: '2342'; Units: none; Entity described: tag",
"type": "string",
"format": "default",
"skos:exactMatch": "http://vocab.nerc.ac.uk/collection/MVB/current/MVB000181/2/"
}
skos:exactMatch
was chosen over rdfType
. Full example at https://github.com/inbo/bird-tracking/blob/master/data/processed/O_ASSEN/datapackage.json
The issues regarding synonyms or improvements to be made to the attribute dictionary are discussed in the movepub repo.
Closing this issue.
I'm attempting to convert a published Movebank dataset into a Frictionless data package, by describing the files and their structure in a
datapackage.json
file. I created a use case for this here. This issue is to capture feedback.Link to file:
datapackage.json
rdfType
to terms with link to term URL