o2r-project / o2r-meta

Metadata toolsuite for an extract-map-validate workflow supporting reproducible research
Apache License 2.0
2 stars 3 forks source link

keywords-attribute as array of strings #7

Closed jansule closed 7 years ago

jansule commented 7 years ago

For providing multiple keywords the keyword-attribute should be an array of strings instead of just a string. See here: https://github.com/o2r-project/o2r-meta/blob/d708922e354d5e8c13292c3a69e19035eeffd585/schema/json/o2r-meta-schema.json#L82-L84

ghost commented 7 years ago

I agree. We might have to deal with an array of arrays as provided by journal styles. e.g.

keywords
+ cat1
++ tag1, tag2, tag3
+ cat2
++ tag1, tag2, tag3
jansule commented 7 years ago

@7048730 could you please provide a real example for this,?

MarkusKonk commented 7 years ago

I am a bit confused. Why an array of arrays? A simple array should be enough, I think. They also usually not categorized in our field.

ghost commented 7 years ago

jstatsoft yaml header as source for instance might confront us with this (and I also saw similar in random r scripts I searched at figshare):

keywords:
  formatted: [dynamic programming, MODIS time series, land use changes, crop monitoring]
  plain:     [dynamic programming, MODIS time series, land use changes, crop monitoring]

I think, the extractor should also be able to handle it when plain and formatted differ, otherwise I'd also tend to simplify this

Edit: We should also be prepared to handle hierarchical keywords, such as taxonomies etc

MarkusKonk commented 7 years ago

Oh, now I see what you mean by categories. Usually they should not differ, I think and if they do so, do we mind? Do we need all categories?

ghost commented 7 years ago

well we could just say if plain in keywords -> take plain, else take keywords when parsing the source (for now). That should simplify. Also, you will soon use metadata_o2r.json which is the refined version of metadata_raw,json. I will implement it there...

MarkusKonk commented 7 years ago

That should be enough for us. I can't imagine a case where we would need both arrays, maybe in case of special characters? However, we will stay with your proposed solution for now.

ghost commented 7 years ago

included in d5079a7153b67b2368d96a54e7cd8775c58ea12a