sccn / labstreaminglayer

LabStreamingLayer super repository comprising submodules for LSL and associated apps.
Other
544 stars 161 forks source link

Standard system of units #66

Closed dmedine closed 3 years ago

dmedine commented 3 years ago

I have been dealing with with a very annoying issue in a number of projects lately and that is how to represent units of measure properly in LSL meta-data. The LSL utilities for creating the XML meta data handle only std::string types (https://github.com/sccn/liblsl/blob/200c1c938a4ed3fa922cff8c0208f7d768b472e5/include/lsl_cpp.h#L1691-L1730). This means that unicode strings might not be encoded properly. Furthermore, even if they are, programming environments like Matlab and Python don't like unicode strings so unicode strings in XDF file meta data will be garbled when loaded.

I don't have a solution to this problem, but it is definitely the case that BIDS calls for unicode streams for commonly used Greek letters like Omega (for Ohm) and mu (for micro) https://bids-specification.readthedocs.io/en/latest/99-appendices/05-units.html. In this sense neither LSL nor XDF are BIDS compliant.

I think the cleanest solution from an implementation point of view would be to establish a universal system of tokens for each scientific symbol. Then, BIDS apps can interpret c333 (or whatever) as U+2126 (a unicode encoding for Omega) and Matlab---more specifically load_xdf.m---can interpret it as 'ohm'.

Politically this is a very dirty solution because it entails a lot of work and also convincing BIDS to adopt the standard.

Even better would be to convince the BIDS community to ditch unicode altogether, but I am quite sure that will not happen. As a developer, I think BIDS is flat out wrong to use non ANSI characters in their standard because it opens a pandora's box of programming hell. The most important thing is that information be clear and I don't see how 'microvolts' is any less clear than '(mu)V'.

Another thing to consider is that users in industry will definitely prefer using unicode strings in their software because it looks nicer. In that case it is ok to punt the problem onto their development teams, but they will (again) appreciate it if there is a standard representation available.

I am open to thoughts and suggestions.

tstenner commented 3 years ago

My two cents (paraphrased from the slack channel):

Even for simple things like microvolts, there's ㎶ (yes, that's a thing in unicode) vs 𝛍V 𝜇V vs 𝝁V vs 𝝻V 𝞵V vs μV vs µV, all of which look pretty similiar. So, even though I'm a Unicode fan and don't particularly care that Matlab won't support Unicode until R2034b I'd prefer to keep it simple. One idea I had was to seperate the prefix and the unit, e.g. <unit exponent="-6">V</unit>, but some readers will just return a struct like {'unit': 'V'}, dropping the scale. My other proposal, <channel unit='V' exponent='-6'>…</channel> wasn't too well received, even though I still like it.

dmedine commented 3 years ago

I'm closing this since we basically moved the conversation to the slack channel and this issue will very likely not result in a work item.