o2r-project / o2r-meta

Metadata toolsuite for an extract-map-validate workflow supporting reproducible research
Apache License 2.0
2 stars 3 forks source link

Extract license metadata from Rmd header #96

Closed nuest closed 6 years ago

nuest commented 6 years ago

To streamline the upload process of workspaces it would be good to extract license metadata from R Markdown headers. The headers are already parsed via https://github.com/o2r-project/o2r-meta/blob/5c12559803106db3ded76de661696a9414728f7d/parsers/parse_yaml.py (@7048730 correct?) so the yaml parser must simply be extended to also extract

---
title: "Capacity of container ships in seaborne trade from 1980 to 2016 (in million dwt)*"
licenses:
    code: Apache-2.0
    data: ODbL-1.0
    text: CC0-1.0
    ui_bindings: CC0-1.0
    metadata: CC0-1.0
---

Should update the testfile at https://raw.githubusercontent.com/o2r-project/o2r-meta/master/extract/tests/minimal/main.Rmd

Related to #90

Motivation: For the corpus, we want to be able to upload and publish papers without any actions from a user.

ghost commented 6 years ago

@nuest yes. the yaml parser already allows you to access the complete header of rmd files as python dictionary. you can then add a condition for the key licenses and transfer its value into the extracted master md dictionary. Note that some keys in the yaml header are interconnected and their retrieval statements may not be interrupted in the yaml parser code. Since licenses is quite isolated it may be easy and safe to process it at the beginning or the end of https://github.com/o2r-project/o2r-meta/blob/5c12559803106db3ded76de661696a9414728f7d/parsers/parse_yaml.py#L39