neuroinformatics-unit / NeuroBlueprint

Lightweight data specification for systems neuroscience, inspired by BIDS.
http://neuroblueprint.neuroinformatics.dev/
Creative Commons Attribution 4.0 International
17 stars 1 forks source link

Consider using ISO8601 for date component #31

Closed glopesdev closed 1 year ago

glopesdev commented 1 year ago

https://github.com/neuroinformatics-unit/SWC-Blueprint/blob/4699e3c4d95e0265d7f06fcf40c75af8a4107a7c/docs/source/specification.md?plain=1#L60

It can sometimes happen that an experimenter wants to run multiple sessions of the same subject on the same day, so it would be nice to standardize an optional time component for the BIDS path.

A widely used option which would be backwards compatible with the current one is ISO 8601. The current date string format is already a valid ISO 8601 string. This has the further advantage that it can be immediately parsed with the new datetime.fromisoformat class method from python 3.11.

The proposal would be to extend date into datetime and simply allow for any valid ISO 8601 string that the fromisoformat method would also accept. Examples below adapted from their docs:

from datetime import datetime
datetime.fromisoformat("20230810")
datetime.fromisoformat("20230810T1453")
datetime.fromisoformat("20230810T145328Z")
datetime.fromisoformat("2023-08-10")
datetime.fromisoformat("2023-03-07T115528Z")

All the above strings are parsed successfully by the method (my bias, I particularly like the last format since it makes for a more human-readable datetime string which is still file-system compatible and standards-compliant).

niksirbi commented 1 year ago

Thanks a lot @glopesdev for taking the time to read the specification and to raise this issue.

I like the idea of merging the date and time fields into one. As you said, multiple sessions per day are common in systems neuroscience, unlike in neuroimaging - where the BIDS standard came from. I think it would make things simpler to have 1 field.

Regarding the specific formatting, I do love YYYY-MM-DD as it is the most human-readable and unambiguous way of writing dates. The only problem is that in BIDS (and in SWC-Blueprint), the dash - character signifies the separation between keys and values in folder/file names, i.g. key1-value1_key2_value2. This makes it very easy to parse such strings into dictionaries.

That's why we went for the YYYYMMDD version of ISO8601, even though it's less human-readable. So it's essentially a conflict between human-readability and compatibility with BIDS 🤷🏼‍♂️

The options below would not break the key-value convention:

datetime.fromisoformat("20230810")
datetime.fromisoformat("20230810T1453")
datetime.fromisoformat("20230810T145328Z")

But these are not very friendly to human eyes (especially the last two).

Let's have @JoeZiminski also weigh in, as he has implemented relevant functionality in datashuttle.

JoeZiminski commented 1 year ago

Hey @glopesdev thanks a lot for this! That's a very useful suggestion to use ISO8601 formatting for datetime, previously I have been stuck on the best way to represent this.

Currently in datashuttle there are keywords (@DATE@, @TIME@, @DATETIME@) to create a date, time or datetime key in the foldernames. An example foldername with all three key-value pair types:

sub-001_date-20230811_time-165900_datetime-202308_165900

The datetime format in datashuttle (as above) actually needs fixing, because it breaks BIDS, so ISO8601 format is perfect.

I would suggest to maintain flexibility, separate date, time, datetime keys are allowed in SWCBlueprint, but specify that the datetime value must be in ISO8601 format. Unfortunately as Niko mentioned the most readable ISO8601 formats cause some problems with BIDS key-value pair formatting. In this case for reaseachers can choose between more human-readable (sub-001_date-20230811_time-165900) vs. machine readable (sub-001_datetime-20230811T165900) . The SWCBlueprint documentation could be updated to include these date / time / datetime key-value pairs. What do you think?

glopesdev commented 1 year ago

@niksirbi @JoeZiminski thanks for the feedback, it's interesting that when I first read the BIDS Specification I only noticed _ character as a separator. I read about key-value pairs but they were always mentioned in the format key-value.

If this was the principle, then actually there wouldn't be any parsing problems in accepting values with multiple -, as long as keys are exclusively alphanumeric, since you would first parse for _, then look for the first - to separate key from value, and then it would be fine to have multiple - characters in the value label itself, as it wouldn't be ambiguous as long as _ are avoided.

However, I now realize they do say that values also must be strictly "alphanumeric" so I guess that excludes any symbols. It's interesting since further down in the specification they do mention that date time information "MUST be expressed in the following format YYYY-MM-DDThh:mm:ss[.000000][Z]" but I guess this must be for file contents since : is not a valid path character.

Anyway, I agree your proposal may be the best compromise if indeed other BIDS parsers do not understand values with multiple dashes.

JoeZiminski commented 1 year ago

Hey @glopesdev thanks a lot for this, I didn't know that about the date time information formatting for BIDS, that is quite confusing. I agree it must be specifically for the sidecar json metadata as it cannot be for folder / filenames.

I agree to err on the side of caution and avoid non-alphanumeric characters in case they case problems down the line with BIDS parsers, even though in theory it would not be an issue to have a parsing scheme. Another benefit is that it is easier to researchers to remember the key-value formatting rule if there are no exceptions.

I'll make an issue on datashuttle to fix the datetime formatting as we discussed. Thanks again for your feedback and if any other suggestions come to mind it would be great to hear!

JoeZiminski commented 1 year ago

closed with #38 Thanks @glopesdev!