tdwg / cd

Collection Descriptions
Creative Commons Attribution 4.0 International
23 stars 10 forks source link

Specify array-item datatypes #492

Closed essvee closed 8 months ago

essvee commented 1 year ago

Part of end-stage-review QC work (ref notes from 18th May 2023).

Decide how to represent list-like ltc terms in the datatypes.csv and then do that thing. Approaches could be: separate fields for datatype and array item datatype, or leave datatypes.csv as is but use a convention like array<string>

useful ref from reviews: https://data.naturalsciences.org/docs/0.1/specifications/data-types

essvee commented 1 year ago

@ben-norton - I've updated the datatypes file on a separate branch with a poss approach to this - just used java-type syntax to indicate the dataype of the items in the array/multi-val field. Not ideal but at least it says what they are - pls could you take a look (here] and let me know if you think there's a better way to do it, within the confines of a csv at least?

ben-norton commented 1 year ago

@essvee One suggestion and one question. 1. I would replace <> with brackets to reflect JSON notation. 2. There are two types of arrays: 1) Simple (simple list of values), 2) Complex (one more objects, the one-to-many relationships). It looks like you can make this distinction using the namespace prefix. If a term is prefixed with a namespace its a complex array. If not, then its a simple array. Is there is an instance where this breaks? Is there a simple array where the term is given a namespace prefix or vice versa (a complex array without a namespace prefix)?

ben-norton commented 1 year ago

@essvee

essvee commented 1 year ago

Hello @ben-norton,

  1. Noted! Square brackets, d'you mean?
  2. There shouldn't be a situation where objects don't have a namespace, so long as we keep fully qualifying the term name (think I saw some TDWG docs encouraging the use of namespace alongside the termname as best practice, you're likely more aware of this than me). Would you not just use 'not one of [list of simple array types]' rather than scanning for a substring if you were parsing this file tho? Doesn't make much of a diff apart from the scenario the thing fails in I guess 😜
ben-norton commented 1 year ago

@essvee Ya, I misspoke. Terms that are complex arrays are annotated with the prefix "has", making them easy to identify. Regardless, hold on this issue for the moment. I came up with a better way to distinguish between complex and simple arrays over the weekend. I'm not sure why it took me so long to make the connection. I need to get this approved, but here's the idea. The datatype for simple arrays (list of values) is Enumeration or Enum. The datatype for complex arrays (sets of zero or many objects) is Array. This makes sense for a lot of reasons. More soon.

essvee commented 1 year ago

Oooo I hate to be That Guy @ben-norton, but while I see where you're coming from with enum = 'a list of things', aren't a lot of people going to hear enum = 'there is one value in this field and it's one of the values on this controlled list', aka a database enum-type?

ben-norton commented 1 year ago

@essvee Yes. I need to write a grant for funds to hire the entire LtC team to provide feedback when I have "ideas". You're absolutely right. I know there's a solution, a skos:altLabel for simple arrays.

ben-norton commented 1 year ago

@essvee Lists. Simple arrays are assigned data type "List". Thoughts?

essvee commented 1 year ago

With complex arrays denoted as they are now, or something more like 'List of ltc:Foo'? List might be better term than array tbf, more familiar and a bit more generic.

I did consider having two fields in the datatypes csv: one flag to indicate that the field can be multi-value and the other to specify the type of values in the field (whether or not they were inside an iterable). But in the end I figured that might be over-baking it, esp. working on the assumption that this bit of the docs is for human use?

ben-norton commented 1 year ago

Yes. List for simple arrays. colors: red, yellow, orange

Array for complex arrays hasColors: [{ "name": "red", "html": "#FF0000", "rgb": "255,0,0", }, { "name": "green", "html": "#00FF00", "rgb": "0,255,0" } ]

I think so. I'm generating the docs from the csv files as a test run.

essvee commented 8 months ago

Closing, agreed syntax was 'array'