Closed znicholls closed 6 years ago
Yes, but within the creator, contributor or fundingReferences blocks not all information is mandatory. We used DataCite definitions as orientation (http://doi.org/10.5438/0014) but made some changes.
"fundingReferences": [
{
"funderName": "Federal Ministry of Education and Research (BMBF)"
}
"creators": [
{
"creatorName": "Taylor, Karl E.",
"givenName": "Karl E.",
"familyName": "Taylor",
"email": "taylor13@llnl.gov",
"affiliation": "Lawrence Livermore National Laboratory"
}
],
"contributors": [
{
"contributorType": "ContactPerson",
"contributorName": "Jungclaus, Johann",
"givenName": "Johann",
"familyName": "Jungclaus",
"email": "johann.jungclaus@mpimet.mpg.de",
"affiliation": "Max-Planck-Institut fuer Meteorologie"
}
]
"creators": [
{
"creatorName": "Max-Planck-Institut fuer Meteorologie (MPI-M)"
}
] ,
"contributors": [
{
"contributorType": "ResearchGroup",
"contributorName": "Max-Planck-Institut fuer Meteorologie (MPI-M)"
}
]
Hey @MartinaSt could you double check the subjects
field for me please? How should this be generated, something like?
"subjects":
[
{
"subject":"<activity_id>.CMIP6.<target_MIP>.<institution-id>[.<source-id>]",
"subjectScheme":"DRS"
},
{"subject":"climate"},
{"subject":"CMIP6"},
{"subject":"<custom-user-field>"},
]
Or is this field never used by the citation tool?
Hi @znicholls
The DRS subject is used to connect the provided information to the right database entry. Thus it is very important!
But all other subjects are ignored. Thus, I would delete the <custom-user-field>
subject.
According to the keys, you find in the netCDF data header, the DRS subject is constructed as:
"subjects":
[
{
"subject":"<mip_era>.<activity_id>.<institution_id>.<source_id>[.<experiment_id>] ",
"schemeURI": "http://github.com/WCRP-CMIP/CMIP6_CVs",
"subjectScheme":"DRS"
}
]
Btw, the first DOI on CMIP6 data was registered (Data access is still restricted to infrastructure developers.): landing page: https://doi.org/10.22033/ESGF/CMIP6.1534 JSON: https://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/exportcmip6?input=CMIP6.CMIP.IPSL.IPSL-CM6A-LR
Ok that complicates things. Mainly because it appears to me like different files have different conventions. For example, not all files have the experiment id easily accessible in the filename ( e.g. it has to be introspected from the input4MIPs concentrations and emissions filename based on knowledge that it comes at the end of the source_id) and it's also not always in the nc file, e.g. the input4MIPs files don't even use an experiment_id field..
For now I'll write this to target files that have an experiment_id
field, assuming that all the input4MIPs stuff is now done so it isn't worth worrying about that edge case
Do you use the file names only? No opening the files to read the global attributes? And no consideration of the directory structure?
An example for a CMIP6 file name is: `rlutcsaf_AERmon_CNRM-CM6-1_1pctCO2_r1i1p1f2_gr_185001-199912.nc
Ok I think everything has now become much clearer. As an input provider, I was following the input forcing data specs. As you'll see, our filenames are different from the output file names.
input4MIPs name
<variable_id>_input4MIPs_<dataset_category>_<target_mip>_<source_id>_<grid_label>[_<time_range>].nc
Output name
<variable_id>_<table_id>_<source_id>_<experiment_id>_<member_id>_<grid_label>_<time slice>.nc
So it looks like I was solving a problem you didn't have (but I did as an input4MIPs provider). Haha oops!
I'm going to close this issue and start a new one to try and get us on the same page.
@MartinaSt just want to check that I've correctly understood the format of the json we want to produce. Can you double check the format and my split of ignored, optional, compulsory and compulsory but fixed (i.e. fields that must be there but the content is always the same) fields?