Closed dehays closed 3 years ago
@dehays every study
object has a name
attribute. Is proposal name
different than the study's name?
@dehays study
already has a doi
attribute:
study:
is_a: named thing
in_subset:
- sample subset
aliases: ['proposal', 'research proposal', 'research study', 'investigation']
description: >-
A study summarizes the overall goal of a research initiative and outlines the key objective of its underlying projects.
slots:
- ecosystem
- ecosystem_category
- ecosystem_type
- ecosystem_subtype
- specific_ecosystem
- principal investigator name
- doi
But, the range of doi
is an attribute value; e.g.:
"doi": { "has_raw_value": "10.25585/1487764" }
Do we need doi
to be a list (i.e., multi-valued), or will a study only have one doi
?
@wdduncan On GOLD study name vs proposal name - I want to touch base with Emiley and Jeff with what we do here. What I see is the the UI is using the proposal name Stegen Study - notice the name displayed which is not something you are getting from GOLD (The name in the study entity is set to the GOLD study name, for example: "Groundwater microbial communities from the Columbia River, Washington, USA". I think we are going to need to add another name - not sure proposal name should be the key, perhaps display_name. But since it is not coming from GOLD it would need to be optional for the ETL to produce valid output.
I think a study would have a single DOI and as you point out, there is a slot for that. Publication DOIs would need to be a list.
Thanks @dehays
I'm a bit partial to naming the attribute display name
. It seems to fit the purpose you describe.
What do you think @cmungall ?
GOLD:
https://gold.jgi.doe.gov/study?id=Gs0114663
MIxS:
MIxS standardizes fields for samples not studies, but it doesn't follow normal form and does have repeated investigation variables such as project_name. project_name is underspecified, and if we look at existing values in INSDC they are all over the place. It ranges from "16S" to proper titles.
NCBI BioProject:
the study is broken into multiple projects 1 per sample, with identical metadata
https://www.ncbi.nlm.nih.gov//bioproject/PRJNA367315 ... https://www.ncbi.nlm.nih.gov//bioproject/PRJNA367318
Groundwater microbial communities from the Columbia River, Washington, USA - GW-RW S3_40_50 metagenome
Coupling Microbial Communities to Carbon and Contaminant Biogeochemistry in the Groundwater-Surface Water Interaction Zone
Relevance: Environmental
(Remediation and Carbon cycle seem to have been dropped)
NMDC:
https://data.microbiomedata.org/details/study/gold:Gs0114663
Coupling Microbial Communities to Carbon and Contaminant Biogeochemistry in the Groundwater-Surface Water Interaction Zone Description A metagenomic study to couple microbial communities to carbon and contaminant biogeochemistry in the groundwater-surface water interaction zone
Scientific objective To understand and predict the effects of variable groundwater-surface water mixing on microbial communities and, in turn, biogeochemical rates under the Subsurface Biogeochemical Research-Science Focus Area (SBR-SFA).
Not sure where objective is coming from, it's not in the schema or the json? Hardcoded in the UI?
My proposal:
TBD: should we have separate fields for doi, ark, etc, or just a generic citation field that takes a CURIE or PURL? My pref is the latter
I agree with @cmungall that title
addresses the need to distinguish between what we get out of GOLD and what the study
is called from the perspective of a funding agency.
A related issue to this is the long names we get out of gold for biosamples
. I'm not so sure that title
is the appropriate slot for shortening the GOLD biosample names, although we can use it for such a purpose. It might be better to have a display name
slot, or perhaps we make use of alternative description
.
We can also make use of an other_names
slot, although I prefer to call it alternative names
(this seems more consistent with alternative identifier
and alternative description
).
If we follow the suggestion of have a generic citation field to cover doi, ark, etc what do call it? citations
? Also, do we need to create a citation
object so that we specify if it is a doi etc.? We can do this, but I am unsure if the cost of doing so is worth the effort.
@dehays I've added the following slots to nmdc:study
:
I think/hope this should cover Kitwares needs for displaying the name of a study.
That would seem to cover 'proposal name' from the list I'd started with. Also checked off DOI as studies already have a lost for that.
Not your responsibility - but after populating those title and alternative title fields - will need to have Kitware understand which to display.
Update after metadata call:
There is a need to associate studies with websites. Should I add a websites
slot, or make use of the existing url
slot in core.yaml
:
url:
is_a: attribute
range: string
update:
I'll make a websites
slot (for now).
publication DOIs
is simply names publications
.
part of decomposition of #41
To be added to study entity as optional attributes. These are not currently available from GOLD studies and therefore cannot presently be populated from the GOLD -> NMDC ETL