tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
42 stars 5 forks source link

Allow milliseconds for all timestamps #333

Open peterdesmet opened 1 year ago

peterdesmet commented 1 year ago

Some terms (especially media timestamp and eventStart and eventEnd) could benefit from expressing milliseconds. Ideally we support 4 ways of writing timestamps (all ISO):

2013-11-23T08:30:00Z         # seconds UTC
2013-11-23T06:30:00-0200     # seconds + offset
2013-11-23T08:30:00.300Z     # milliseconds + UTC
2013-11-23T06:30:00.300-0200 # milliseconds + offset

All of these seem to pass Frictionless Framework validation for format=default, but according to the specs, default implies:

An ISO8601 format string e.g. YYYY-MM-DDThh:mm:ssZ in UTC time

It looks like Frictionless Framework also allows timezone offsets (i.e. more than the specs allow). @roll is this intentional?

For Camtrap DP we therefor opted to explicitly define the format:

https://github.com/tdwg/camtrap-dp/blob/22f5309f9a29d2b202d453c4c68d52d325c21e28/observations-table-schema.json#L53

That pattern will however fail for milliseconds, so we should adapt it. Or we should always require milliseconds.

Note that %z correctly allows -0200, +0200, Z and correctly forbids empty and -0280.

peterdesmet commented 1 year ago

Explicitly defining the format (i.e. not relying on default) and having milliseconds optional is not possible. format relies on strptime, which does not have a format code for optional milliseconds (there is a feature request for it). So we have two options:

  1. Always requiring milliseconds and using format %Y-%m-%dT%H:%M:%S.%f%z. Note that e.g. Movebank exports always have milliseconds.
  2. Relying on format = default, which allows (in Frictionless Framework) the four formats above, even though the specs say it should forbid non UTC. It (annoyingly) also allows timestamps without timezone information.
peterdesmet commented 8 months ago

Discussed with @kbubnicki, moved to 1.1. Camtrap DP 1.0 will thus not allow milliseconds, but we keep the explicit format that warns for missing timezone (which is likely a more common use case). We'll have to see how this evolves at Frictionless.