o2r-project / o2r-meta

Metadata toolsuite for an extract-map-validate workflow supporting reproducible research
Apache License 2.0
2 stars 3 forks source link

Error validating metadata #98

Closed nuest closed 6 years ago

nuest commented 6 years ago

Currently there is an error saving our demo example "A question driven ...". The metadata after uploading and selecting licenses is not valid:

{  
  "o2r":{  
    "upload_type":"publication",
    "title":"A question driven socio-hydrological modeling process",
    "temporal":{  
      "end":"2016-01-15T00:00:00",
      "begin":"2016-01-15T00:00:00"
    },
    "spatial":{  },
    "publication_type":"other",
    "publication_date":"2018-03-16T00:00:00.000Z",
    "paperLanguage":[  ],
    "mainfile_candidates":[  ],
    "mainfile":"interactiveFigure1.Rmd",
    "license":{  
      "uibindings":"MIT",
      "text":"CC0-1.0",
      "md":null,
      "data":"CC0-1.0",
      "code":"MIT"
    },
    "keywords":[  ],
    "interaction":[  

    ],
    "inputfiles":[  
      "SyntheticStreamflow.csv"
    ],
    "identifier":{  
      "reserveddoi":null,
      "doiurl":"https://doi.org/doi:10.5194/hess-20-73-2016",
      "doi":"doi:10.5194/hess-20-73-2016"
    },
    "ercIdentifier":"gTfIC",
    "displayfile_candidates":[  
      "table4.PNG",
      "A_question_driven.html",
      "table1.PNG",
      "table2.PNG",
      "figure4.PNG",
      "figure3.PNG",
      "figure89.PNG",
      "figure6.PNG",
      "figure2.PNG",
      "table5.PNG",
      "figure1.png",
      "table3.PNG",
      "figure5.PNG"
    ],
    "displayfile":"table4.PNG",
    "description":"Human ... model.",
    "depends":[  ],
    "creators":[  ],
    "communities":[  
      {  
        "identifier":"o2r"
      }
    ],
    "codefiles":[  ],
    "access_right":"open"
  }
}

The error message is

{  
  "error":"Error updating metadata file, see log for details",
  "log":"[o2rmeta] 20180316.085856 received arguments: {'debug': True, 'tool': 'validate', 'schema': 'schema/json/o2r-meta-schema.json', 'candidate': '/tmp/o2r/compendium/gTfIC/.erc/metadata_o2r_1.json'}
[o2rmeta] 20180316.085856 launching validator
[o2rmeta] 20180316.085856 checking metadata_o2r_1.json against o2r-meta-schema.json
[o2rmeta] 20180316.085856 !invalid: None is not of type 'string'

Failed validating 'type' in schema['properties']['depends']['items']['properties']['version']:
    {'type': 'string'}

On instance['depends'][0]['version']:
    None"
}

I assume this means that the property depends cannot be an empty array. However, the following JSON is saved just fine (based on R Markdown example workspace), and depends is also empty.

{  
  "o2r":{  
    "upload_type":"publication",
    "title":"Capacity of container ships in seaborne trade from 1980 to 2016 (in million dwt)*",
    "temporal":{  
      "end":"2017-03-16T00:00:00",
      "begin":"2017-03-16T00:00:00"
    },
    "spatial":{  },
    "publication_type":"other",
    "publication_date":"2018-03-16T00:00:00.000Z",
    "paperLanguage":[  

    ],
    "mainfile_candidates":[  
      "main.Rmd"
    ],
    "mainfile":"main.Rmd",
    "license":{  },
    "keywords":[  ],
    "interaction":[  ],
    "inputfiles":[  ],
    "identifier":{  },
    "ercIdentifier":"sYeqH",
    "displayfile_candidates":[  ],
    "displayfile":"display.html",
    "description":"Capacity of container ships in seaborne trade of the world container ship fleet.\n",
    "depends":[  ],
    "creators":[  ],
    "communities":[  ],
    "codefiles":[  
      "main.Rmd"
    ],
    "access_right":"open"
  }
}

@7048730 Can you explain what happens here?

ghost commented 6 years ago

@nuest your example metadata are subkeys to a key named o2r. However our metadata elements must be toplevel without governing keys like they are used in DB. That is why your examples dont pass validation at all when I try. When I rewrite your example to fulfill this condition, the validator still announces validation exceptions and those are according to the missing required elements of our schema file. That is why I am a bit puzzled about you saying the second example works.

nuest commented 6 years ago

I copied the examples from the browser request, which is sent to the server. On the server, the o2r element is not included in the file metadata_o2r_1.json which is validated.

So, when you adjust the structure and try validating, both fail?

ghost commented 6 years ago

Take a look at the example for a valid md, which is also validated in each build on travis: https://github.com/o2r-project/o2r-meta/blob/master/schema/json/example_metadata_o2r_valid.json In that example depends is also empty. So there might be a tiny mistake or deviation in your md that the validator (rather the imported module it uses) hiccoughs against

nuest commented 6 years ago

I tend to deactivate validation again, because I do not know where to look for the tiny mistake. Also, the extraction procession creates invalid metadata, cf. the following examples extracted/brokered by o2r-meta:5c12559

{
"id": "z5p7I",
"metadata": {
"o2r": {
"upload_type": "publication",
"title": "A question driven socio-hydrological modeling process",
"temporal": {
"end": "2016-01-15T00:00:00",
"begin": "2016-01-15T00:00:00"
},
"spatial": {
"union": {
"bbox": [
[
181,
181
],
[
-181,
181
],
[
-181,
-181
],
[
181,
-181
]
]
},
"files": []
},
"publication_type": "other",
"publication_date": "2018-03-19",
"paperLanguage": [
"en"
],
"mainfile_candidates": [
"shFun.R",
"interactiveFigure1.Rmd",
"multiplot.R",
"Ui.R",
"LaunchModel.R",
"main.Rmd",
"server.R"
],
"mainfile": "interactiveFigure1.Rmd",
"license": {
"uibindings": null,
"text": null,
"md": null,
"data": null,
"code": null
},
"keywords": [],
"interaction": [],
"inputfiles": [
"SyntheticStreamflow.csv"
],
"identifier": {
"reserveddoi": null,
"doiurl": "https://doi.org/doi:10.5194/hess-20-73-2016",
"doi": "doi:10.5194/hess-20-73-2016"
},
"ercIdentifier": "z5p7I",
"displayfile_candidates": [
"figure89.png",
"figure2.png",
"figure3.png",
"table1.png",
"display.html",
"figure4.png",
"figure6.png",
"figure5.png",
"table4.png",
"table2.png",
"table5.png",
"figure1.png",
"table3.png"
],
"displayfile": "figure89.png",
"description": "Human and hydrological systems are coupled: human activity impacts the hydrological cycle and hydrological conditions can, but do not always, trigger changes in human systems. Traditional modeling approaches with no feedback between hydrological and human systems typically cannot offer insight into how different patterns of natural variability or human-induced changes may propagate through this coupled system. Modeling of coupled human–hydrological systems, also called socio-hydrological systems, recognizes the potential for humans to transform hydrological systems and for hydrological conditions to influence human behavior. However, this coupling introduces new challenges and existing literature does not offer clear guidance regarding model conceptualization. There are no universally accepted laws of human behavior as there are for the physical systems; furthermore, a shared understanding of important processes within the field is often used to develop hydrological models, but there is no such consensus on the relevant processes in socio-hydrological systems. Here we present a question driven process to address these challenges. Such an approach allows modeling structure, scope and detail to remain contingent on and adaptive to the question context. We demonstrate the utility of this process by revisiting a classic question in water resources engineering on reservoir operation rules: what is the impact of reservoir operation policy on the reliability of water supply for a growing city? Our example model couples hydrological and human systems by linking the rate of demand decreases to the past reliability to compare standard operating policy (SOP) with hedging policy (HP). The model shows that reservoir storage acts both as a buffer for variability and as a delay triggering oscillations around a sustainable level of demand. HP reduces the threshold for action thereby decreasing the delay and the oscillation effect. As a result, per capita demand decreases during periods of water stress are more frequent but less drastic and the additive effect of small adjustments decreases the tendency of the system to overshoot available supplies. This distinction between the two policies was not apparent using a traditional noncoupled model.",
"depends": [
{
"version": null,
"packageSystem": "https://cloud.r-project.org/",
"identifier": "ggplot2",
"category": "geo sciences,CRAN Top100"
},
{
"version": null,
"packageSystem": "https://cloud.r-project.org/",
"identifier": "plyr",
"category": "geo sciences,CRAN Top100"
},
{
"version": null,
"packageSystem": "https://cloud.r-project.org/",
"identifier": "reshape2",
"category": "CRAN Top100"
}
],
"creators": [
{
"orcid": null,
"name": "Garcia, M.",
"affiliation": "Civil & Environmental Engineering Department, Tufts University"
},
{
"orcid": null,
"name": "Portney, K",
"affiliation": "Bush School of Government & Public Service, Texas A&M University"
},
{
"orcid": null,
"name": "Islam, S.",
"affiliation": "The Fletcher School of Law and Diplomacy, Tufts University"
}
],
"communities": [
{
"identifier": "o2r"
}
],
"codefiles": [
"shFun.R",
"interactiveFigure1.Rmd",
"multiplot.R",
"Ui.R",
"LaunchModel.R",
"main.Rmd",
"server.R"
],
"access_right": "open"
},
"raw": {
"version": null,
"upload_type": "publication",
"title": "A question driven socio-hydrological modeling process",
"temporal": {
"end": "2016-01-15T00:00:00",
"begin": "2016-01-15T00:00:00"
},
"spatial": {
"union": {
"bbox": [
[
181,
181
],
[
-181,
181
],
[
-181,
-181
],
[
181,
-181
]
]
},
"files": []
},
"softwarePaperCitation": null,
"researchQuestions": [],
"researchHypotheses": [],
"recordDateCreated": null,
"rdata": {
"rdata_files": []
},
"r_output": [
{
"text": "plot",
"line": 143,
"feature": "result"
},
{
"text": "plot",
"line": 152,
"feature": "result"
},
{
"text": "plot",
"line": 162,
"feature": "result"
},
{
"text": "plot",
"line": 173,
"feature": "result"
},
{
"text": "plot",
"line": 184,
"feature": "result"
},
{
"text": "plot",
"line": 193,
"feature": "result"
},
{
"text": "plot",
"line": 200,
"feature": "result"
}
],
"r_input": [
{
"text": "SyntheticStreamflow.csv",
"line": 88,
"feature": "data input"
}
],
"r_comment": [
{
"text": "1 Introduction",
"line": 5,
"feature": "comment"
},
{
"text": "Set simulation time period",
"line": 92,
"feature": "comment"
},
{
"text": "Demand Parameters",
"line": 96,
"feature": "comment"
},
{
"text": "Shortage Memory Parameters",
"line": 101,
"feature": "comment"
},
{
"text": "Population Parameters",
"line": 105,
"feature": "comment"
},
{
"text": "Hydrologic Parameters",
"line": 111,
"feature": "comment"
},
{
"text": "Intial Conditions",
"line": 120,
"feature": "comment"
}
],
"publication_type": "other",
"publicationDate": "2018-03-19",
"provenance": [],
"paperLanguage": [
"en"
],
"ncdf": {
"ncdf_files": []
},
"mainfile_candidates": [
"shFun.R",
"interactiveFigure1.Rmd",
"multiplot.R",
"Ui.R",
"LaunchModel.R",
"main.Rmd",
"server.R"
],
"mainfile": "interactiveFigure1.Rmd",
"license": {
"uibindings": null,
"text": null,
"md": null,
"data": null,
"code": null
},
"keywords": [],
"interaction": [],
"inputfiles": [
"SyntheticStreamflow.csv"
],
"identifier": {
"reserveddoi": null,
"doiurl": "https://doi.org/doi:10.5194/hess-20-73-2016",
"doi": "doi:10.5194/hess-20-73-2016"
},
"generatedBy": "o2r-meta metaextract.py",
"file": {
"mimetype": null,
"filepath": null,
"filename": null
},
"ercIdentifier": "z5p7I",
"displayfile_candidates": [
"figure89.png",
"figure2.png",
"figure3.png",
"table1.png",
"display.html",
"figure4.png",
"figure6.png",
"figure5.png",
"table4.png",
"table2.png",
"table5.png",
"figure1.png",
"table3.png"
],
"displayfile": "figure89.png",
"description": "Human and hydrological systems are coupled: human activity impacts the hydrological cycle and hydrological conditions can, but do not always, trigger changes in human systems. Traditional modeling approaches with no feedback between hydrological and human systems typically cannot offer insight into how different patterns of natural variability or human-induced changes may propagate through this coupled system. Modeling of coupled human–hydrological systems, also called socio-hydrological systems, recognizes the potential for humans to transform hydrological systems and for hydrological conditions to influence human behavior. However, this coupling introduces new challenges and existing literature does not offer clear guidance regarding model conceptualization. There are no universally accepted laws of human behavior as there are for the physical systems; furthermore, a shared understanding of important processes within the field is often used to develop hydrological models, but there is no such consensus on the relevant processes in socio-hydrological systems. Here we present a question driven process to address these challenges. Such an approach allows modeling structure, scope and detail to remain contingent on and adaptive to the question context. We demonstrate the utility of this process by revisiting a classic question in water resources engineering on reservoir operation rules: what is the impact of reservoir operation policy on the reliability of water supply for a growing city? Our example model couples hydrological and human systems by linking the rate of demand decreases to the past reliability to compare standard operating policy (SOP) with hedging policy (HP). The model shows that reservoir storage acts both as a buffer for variability and as a delay triggering oscillations around a sustainable level of demand. HP reduces the threshold for action thereby decreasing the delay and the oscillation effect. As a result, per capita demand decreases during periods of water stress are more frequent but less drastic and the additive effect of small adjustments decreases the tendency of the system to overshoot available supplies. This distinction between the two policies was not apparent using a traditional noncoupled model.",
"depends": [
{
"version": null,
"packageSystem": "https://cloud.r-project.org/",
"identifier": "ggplot2",
"category": "geo sciences,CRAN Top100"
},
{
"version": null,
"packageSystem": "https://cloud.r-project.org/",
"identifier": "plyr",
"category": "geo sciences,CRAN Top100"
},
{
"version": null,
"packageSystem": "https://cloud.r-project.org/",
"identifier": "reshape2",
"category": "CRAN Top100"
}
],
"communities": [
{
"identifier": "o2r"
}
],
"codefiles": [
"shFun.R",
"interactiveFigure1.Rmd",
"multiplot.R",
"Ui.R",
"LaunchModel.R",
"main.Rmd",
"server.R"
],
"bagit": {
"bagittxt_files": []
},
"author": [
{
"orcid": null,
"name": "Garcia, M.",
"affiliation": "Civil & Environmental Engineering Department, Tufts University"
},
{
"orcid": null,
"name": "Portney, K",
"affiliation": "Bush School of Government & Public Service, Texas A&M University"
},
{
"orcid": null,
"name": "Islam, S.",
"affiliation": "The Fletcher School of Law and Diplomacy, Tufts University"
}
],
"access_right": "open"
}
},
"created": "2018-03-19T21:30:22.394Z",
"user": "0000-0001-6225-344X",
"bag": false,
"compendium": false,
"substituted": false,
"files": {
"path": "/api/v1/compendium/z5p7I/data",
"name": "z5p7I",
"children": [
{
"path": "/api/v1/compendium/z5p7I/data/.erc",
"name": ".erc",
"children": [
{
"path": "/api/v1/compendium/z5p7I/data/.erc/metadata_o2r_1.json",
"name": "metadata_o2r_1.json",
"size": 5561,
"extension": ".json",
"type": "application/json"
},
{
"path": "/api/v1/compendium/z5p7I/data/.erc/metadata_raw.json",
"name": "metadata_raw.json",
"size": 7823,
"extension": ".json",
"type": "application/json"
},
{
"path": "/api/v1/compendium/z5p7I/data/.erc/package_slip.json",
"name": "package_slip.json",
"size": 409,
"extension": ".json",
"type": "application/json"
}
],
"size": 13793,
"type": "directory"
},
{
"path": "/api/v1/compendium/z5p7I/data/A question driven socio-hydrological modeling process.pdf",
"name": "A question driven socio-hydrological modeling process.pdf",
"size": 2212083,
"extension": ".pdf",
"type": "application/pdf"
},
{
"path": "/api/v1/compendium/z5p7I/data/Aquestiondrivenprocess.Rproj",
"name": "Aquestiondrivenprocess.Rproj",
"size": 205,
"extension": ".rproj"
},
{
"path": "/api/v1/compendium/z5p7I/data/DESCRIPTION",
"name": "DESCRIPTION",
"size": 84,
"extension": ""
},
{
"path": "/api/v1/compendium/z5p7I/data/LaunchModel.R",
"name": "LaunchModel.R",
"size": 266,
"extension": ".r",
"type": "script/x-R"
},
{
"path": "/api/v1/compendium/z5p7I/data/ReadMe",
"name": "ReadMe",
"size": 1029,
"extension": ""
},
{
"path": "/api/v1/compendium/z5p7I/data/SyntheticStreamflow.csv",
"name": "SyntheticStreamflow.csv",
"size": 1946,
"extension": ".csv",
"type": "text/csv"
},
{
"path": "/api/v1/compendium/z5p7I/data/Ui.R",
"name": "Ui.R",
"size": 1120,
"extension": ".r",
"type": "script/x-R"
},
{
"path": "/api/v1/compendium/z5p7I/data/display.html",
"name": "display.html",
"size": 1899095,
"extension": ".html",
"type": "text/html"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure1.png",
"name": "figure1.png",
"size": 29123,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure2.png",
"name": "figure2.png",
"size": 26915,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure3.png",
"name": "figure3.png",
"size": 26646,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure4.png",
"name": "figure4.png",
"size": 27377,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure5.png",
"name": "figure5.png",
"size": 35628,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure6.png",
"name": "figure6.png",
"size": 195229,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/figure89.png",
"name": "figure89.png",
"size": 145719,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/interactiveFigure1.Rmd",
"name": "interactiveFigure1.Rmd",
"size": 5596,
"extension": ".rmd"
},
{
"path": "/api/v1/compendium/z5p7I/data/main.Rmd",
"name": "main.Rmd",
"size": 71105,
"extension": ".rmd"
},
{
"path": "/api/v1/compendium/z5p7I/data/multiplot.R",
"name": "multiplot.R",
"size": 1527,
"extension": ".r",
"type": "script/x-R"
},
{
"path": "/api/v1/compendium/z5p7I/data/server.R",
"name": "server.R",
"size": 5265,
"extension": ".r",
"type": "script/x-R"
},
{
"path": "/api/v1/compendium/z5p7I/data/shFun.R",
"name": "shFun.R",
"size": 3352,
"extension": ".r",
"type": "script/x-R"
},
{
"path": "/api/v1/compendium/z5p7I/data/table1.png",
"name": "table1.png",
"size": 31246,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/table2.png",
"name": "table2.png",
"size": 31323,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/table3.png",
"name": "table3.png",
"size": 112860,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/table4.png",
"name": "table4.png",
"size": 25783,
"extension": ".png",
"type": "image/png"
},
{
"path": "/api/v1/compendium/z5p7I/data/table5.png",
"name": "table5.png",
"size": 56177,
"extension": ".png",
"type": "image/png"
}
],
"size": 4960492,
"type": "directory"
},
"candidate": true
}
nuest commented 6 years ago

IMO the brokering should not create invalid output, at least none that the user can currently not fix via the editor.

nuest commented 6 years ago

The example "A question driven ..." can currently automatically be published (dev branch), so closing the issue.