Add support for cross-modality metadata

neuroinformatics-unit / NeuroBlueprint

Lightweight data specification for systems neuroscience, inspired by BIDS.

http://neuroblueprint.neuroinformatics.dev/

Creative Commons Attribution 4.0 International

17 stars 1 forks source link

Add support for cross-modality metadata #9

Closed adamltyson closed 3 months ago

adamltyson commented 1 year ago

Do we have an idea of how cross-modal metadata (e.g. timings) will be stored? Will they be dumped in an extra directory, or do we want to try and store this information in a standardised way?

JoeZiminski commented 1 year ago

it would be nice to standardise this, for later in analysis pipelies. For the most part is meta-data generated in relation to a specific peice of equiptment? e.g. behaviour response times, video tracking times, ephys TTL triggers and we could store in the data type subfolders?

adamltyson commented 1 year ago

I think it varies. There are timings associated with specific modalities, but also general timing signals that are used to synchronise all data types. Time for another survey?

bendichter commented 1 year ago

From my quick look through the docs, this stands out as a large potential gap of the standard. We have found that data is commonly recorded from 2 or even 4-5 independent acquisition systems that need to be aligned. These different streams can have different sampling rates, starting times, and can drift between one another. All three of these cases really need to be accounted for because each one comes up often. Here is how we handle this when converting to NWB. I would guess that since you are planning on storing the original raw data files you would use a different solution.

JoeZiminski commented 1 year ago

Thanks a lot @bendichter, this is definitely something we should think about sooner rather than later. That is a useful link you sent, I will send that to some researchers as the alignment is always very tricky. It will be good to leverage existing tools for this as much as possible. @niksirbi @adamltyson maybe we can run a survey to see how people are storing / dealing with their TTL pulses etc for their behavioural / ephys alignment?

adamltyson commented 1 year ago

@JoeZiminski sounds good. This isn't my area of expertise, but my feeling has always been that it will involve creating some metadata format. For downstream derivative data, then I think NWB is probably the way, but that doesn't help if we want to support raw data formats.

bendichter commented 1 year ago

I suppose another possible solution would be to have sidecar files for each secondary stream to hold timing information. You'd need to communicate what stream is acting as the primary clock, and either a starting time wrt that clock (in the case where you only record offset) or a full timestamp array wrt that clock (e.g. as a .npy file).

JoeZiminski commented 3 months ago

Will close in favour of #30 as discussion will envelop all forms of metadata