neuroinformatics-unit / NeuroBlueprint

Lightweight data specification for systems neuroscience, inspired by BIDS.
http://neuroblueprint.neuroinformatics.dev/
Creative Commons Attribution 4.0 International
17 stars 1 forks source link

[Feature] Allowable file formats #36

Open bendichter opened 9 months ago

bendichter commented 9 months ago

The docs are not specific about what file formats are allowed within this standard. The formats in the example (spikeGLX and mp4) look good, but how far could this be extended to other standards? Some data formats require proprietary software or specific operating systems to read the data, some are poorly documented, and some lack sufficient metadata to be readable on their own without additional information.

I think it would be helpful to maintain a list of allowable formats for different data types, and require users to convert any non-compliant data formats to an open standard e.g. NWB or NIX.

For instance, for electrophysiology raw, you might allow:

JoeZiminski commented 9 months ago

Thanks @bendichter, my thoughts are that the current specification is aimed to be somewhat lightweight, to cast a broad a net as possible, essentially a relatively small subset of more formal BIDS specifications / proposals. The hope is as people become more familiar with BIDS-like standardisation and see its benefits, they will move closer to full BIDS compliance. My worry about restricting proprietary software formats is that researchers stuck with those formats will think this specification is not for them, and then they will not follow any of it (even the parts they could, like folder organisation). However will be interested to hear also what @adamltyson @niksirbi think.

I that the docs do not do a good enough job of highlighting this - it is not clear that if people are interested in better standardisation and the benefits of open file formats etc. they should check out BIDS / existing BEPS. So I think at the very least the docs can be changed to indicate this and explain the standardization ecosystem (and this specifications place within it) better.

adamltyson commented 9 months ago

I think swc/neuro-blueprint should stay a directory and file naming standard to ease adoption as much as possible, It can then be combined with (optional) metadata and data type standards. The main aim has always been ease of understanding and adoption by researchers for whom this type of thing isn't standard practice.

Agree with everything you say about proprietary file formats though @bendichter. Maybe we should curate a list of recommended formats?

niksirbi commented 9 months ago

Even if we don't mandate a restricted list of file formats, there is nothing wrong with having recommended ones. It might also help educate people about what formats to prefer (on the grounds of longevity, openness, adoption etc.) So if the acquisition system can export one of the recommended formats, they should prefer that over others.

niksirbi commented 9 months ago

I that the docs do not do a good enough job of highlighting this - it is not clear that if people are interested in better standardisation and the benefits of open file formats etc. they should check out BIDS / existing BEPS. So I think at the very least the docs can be changed to indicate this and explain the standardization ecosystem (and this specifications place within it) better.

I completely agree with this.

bendichter commented 9 months ago

Yes, I think that's fair. I understand you are trying to achieve a middle ground where experimentalists have some guidance to create a uniform file organization while maintaining the convenience of keeping their raw data format. I agree it's a good idea to mention this explicitly in the docs and perhaps nudge users to think about the FAIRness of their data format in terms of:

In that context, it might be useful to point users to particularly good standards more as a guidance than as a constraint