tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
45 stars 5 forks source link

Unclear how to write valid "array" content in csv: update tags #62

Closed niconoe closed 3 years ago

niconoe commented 3 years ago

In GitLab by @peterdesmet on Oct 13, 2020, 13:12

Tags currently has datatype Array, but it is unclear how to write correct arrays:

tag #fail 
"tag" #fail
["tag"] #valid 
["tag1", "tag2"] # breaks csv
"[\"tag1\", \"tag2\"]" # breaks csv
"['Meerdaal West','test']" # fail

I wonder if it wouldn't be easy to use string with a pattern such as:

[\S+,?\s?]*

Match any non-whitespace character 1 or more times, then a comma zero or 1 time and whitespace zero or 1 time, this matches

tag
tag,
tag,tag
tag, tag
niconoe commented 3 years ago

In GitLab by @kbubnicki on Oct 13, 2020, 14:44

I think it makes sense to simplify it but then comma can not be used as tags separator as it is csv fields separator, right? Solutions could be:

  1. Always use comma , as csv separator and ; as tags separator.
  2. Always put comma-separated list of tags into double quotes e.g. "tag1,tag2,tag3"

I think I am more for 1.

niconoe commented 3 years ago

In GitLab by @peterdesmet on Oct 13, 2020, 16:15

I generally use | as separator, but before we decide on separators, do we agree to change data type to string (not array)? Note that I think for some fields with enum, Array might still be the best choice.

niconoe commented 3 years ago

In GitLab by @kbubnicki on Oct 13, 2020, 21:08

I think so, at least for fields like this we can change data type to string.

niconoe commented 3 years ago

In GitLab by @peterdesmet on Oct 14, 2020, 11:34

There are a couple of array types in the package profile (sampling_design, organizations, ...) which is fine, because that is JSON anyway.

The only array type in the csv is the above discussed tags, which is better changed to string. Do we want force a pattern for these?

niconoe commented 3 years ago

In GitLab by @peterdesmet on Oct 14, 2020, 11:35

changed title from Unclear how to write valid "array" content to Unclear how to write valid "array" content{+ in csv+}

niconoe commented 3 years ago

In GitLab by @peterdesmet on Nov 19, 2020, 10:08

Decided to change to string, and update definition to recommend | delimited values. Could be enforced with pattern

niconoe commented 3 years ago

In GitLab by @peterdesmet on Nov 19, 2020, 13:09

mentioned in commit 53e22cb1df08f8683c6e99f6f49753c5cf4a8c9b

niconoe commented 3 years ago

In GitLab by @peterdesmet on Nov 19, 2020, 13:11

@kbubnicki see https://gitlab.com/oscf/camtrap-dp/-/commit/53e22cb1df08f8683c6e99f6f49753c5cf4a8c9b and close issue if you agree. Note:

niconoe commented 3 years ago

In GitLab by @peterdesmet on Nov 19, 2020, 13:12

changed title from Unclear how to write valid "array" content in csv to Unclear how to write valid "array" content in csv{+: update tags+}