Closed jansule closed 7 years ago
I agree. We might have to deal with an array of arrays as provided by journal styles. e.g.
keywords
+ cat1
++ tag1, tag2, tag3
+ cat2
++ tag1, tag2, tag3
@7048730 could you please provide a real example for this,?
I am a bit confused. Why an array of arrays? A simple array should be enough, I think. They also usually not categorized in our field.
jstatsoft yaml header as source for instance might confront us with this (and I also saw similar in random r scripts I searched at figshare):
keywords:
formatted: [dynamic programming, MODIS time series, land use changes, crop monitoring]
plain: [dynamic programming, MODIS time series, land use changes, crop monitoring]
I think, the extractor should also be able to handle it when plain
and formatted
differ, otherwise I'd also tend to simplify this
Edit: We should also be prepared to handle hierarchical keywords, such as taxonomies etc
Oh, now I see what you mean by categories. Usually they should not differ, I think and if they do so, do we mind? Do we need all categories?
well we could just say if plain in keywords
-> take plain
, else take keywords
when parsing the source (for now). That should simplify. Also, you will soon use metadata_o2r.json which is the refined version of metadata_raw,json. I will implement it there...
That should be enough for us. I can't imagine a case where we would need both arrays, maybe in case of special characters? However, we will stay with your proposed solution for now.
included in d5079a7153b67b2368d96a54e7cd8775c58ea12a
For providing multiple keywords the
keyword
-attribute should be anarray of strings
instead of just astring
. See here: https://github.com/o2r-project/o2r-meta/blob/d708922e354d5e8c13292c3a69e19035eeffd585/schema/json/o2r-meta-schema.json#L82-L84