openPMD / openPMD-standard

:notebook: Open Standard for Particle-Mesh Data
http://www.openPMD.org
Creative Commons Attribution 4.0 International
78 stars 27 forks source link

Labels for Symbols and Units #149

Open ax3l opened 6 years ago

ax3l commented 6 years ago

Migrated from https://github.com/marcguetg/h5particle/issues/9#issuecomment-345858143

@DavidSagan proposes to define additional attributes for labeling the physical symbol of a record and the physical label of its unit.

ax3l commented 6 years ago

Proposal

Problems we need to address

Proposed Names

ax3l commented 6 years ago

As a note: it is also easily possible for a data reader of the base standard + domain specific extension to map specific symbols from record names during post-processing. The same is possible for deriving an on-the-fly unitSymbol from unitDimension in a specific domain.

This would not require standardization of labels in the base standard then.

DavidSagan commented 6 years ago

@DavidSagan proposes to define additional attributes for labeling the physical symbol of a record and the physical label of its unit.

Actually I was just proposing a units label. I don't mind an optional physical symbol but I'm not sure how useful it will be.

shall the label for the unit label the unit before conversion or after conversion to SI?

Needed are the units before conversion since a reading program cannot figure this out. A reading program should be able to do a decent job of figuring out what string to display for SI values.

will a unit symbol per record (not component) be ok? (see unitDimension per record)

If the standard defines optional units labels at both the record and component levels this gives maximum flexibility at little cost in terms of complexity.

should there be latex in it? should it be expected as latex? which environment? what kind of escapes $[...?

I would not get fancy about this. I look on this as more for informational purposes. So I propose that the standard just say that the units labels can be used for information display and if someone really wants to encode the string with latex there would be nothing in the standard that would prevent that.

As a note: it is also easily possible for a data reader of the base standard + domain specific extension to map specific symbols from record names during post-processing. The same is possible for deriving an on-the-fly unitSymbol from unitDimension in a specific domain.

Not true. If the measurement units are "miles/hour" then a reader will not be able to reconstruct the proper the measurement units.

This would not require standardization of labels in the base standard then.

In fact, I propose that the standard should not try to standardize units labels since they are for informational purposes only.

ax3l commented 6 years ago

When I think about this, I think adding a "unit label" before conversion is not a good idea. Let me explain why.

In order to make data portable, a data writer should not impose how data is represented in a data reader, including its unit system. That means, if a data reader for a specific application wants to represent units in "miles/hour" that is very much possible with the following workflow:

  1. read the reference to the unit SI system, unitSI.
  2. the user of the post-processing/reader application decides for a unit system, say "miles-hours-g-...". The data reader will now form a conversion factor from unitSI * unitRepresentation (can also be an formula if needed with more complex transitions per date).
  3. Now the data is read and individual dates are converted according to 2. Of course, a check if the conversion is actually * 1.0 can be done and save the conversion in case the data writer was already in the unit system the reader tries to represent the data in.

If the measurement units are "miles/hour" then a reader will not be able to reconstruct the proper the measurement units.

What I mean with that was: You read a data set in 3. and parse unitDimension for it. This will give you that your record is in "length / time" dimensions. If you now choose to represent data in a "miles-hours-g-..." system, the according unit label can be reconstructed via a simple lookup for that system's base quantities and "powered" via the exponents in unitDimension. In SI the unit label will be m^1 s^-1 and in the other case miles^1 hours^-1 (formatted as needed).

Related: https://github.com/openPMD/openPMD-viewer/issues/64

ax3l commented 6 years ago

A label for a symbol I find still very useful, e.g. to make something latexy that would be odd/impossible as a record or record-component name.

DavidSagan commented 6 years ago

@ax3l

In order to make data portable, a data writer should not impose how data is represented in a data reader

There is a misunderstanding here. The proposal is for a units label which represents what units the simulation program was using. This does not impose any obligations on a data reader.

The idea is that a units label is for informational purposes only. And this information can be very helpful. So, for example, if I look at the raw data in a file with HDFView and see that the data units was "eV/c" I can immediately make sense of the numbers.

DavidSagan commented 6 years ago

A label for a symbol I find still very useful, e.g. to make something latexy that would be odd/impossible as a record or record-component name.

Since my proposal is for the units label to be for informational purposes only, if a writer wants to encode the label with LaTex that would be fine.

ax3l commented 6 years ago

Ok, "informational purposes only" aka "manual, human reader" is indeed useful for certain workflows!

I wonder if we in such a case we even need to standardize it - since any writer can write any additional attributes in openPMD :) There would not be a benefit besides a reserved/unified attribute name if it is not machine-readable.

If that is important enough for us, we can definitely standardize it :)

DavidSagan commented 6 years ago

I wonder if we in such a case we even need to standardize it

It would be good to standardize this since then a reader, if it is showing the raw data, has the option to show the data units along with the raw data.

ax3l commented 6 years ago

Admittedly, our goal is to motivate people not to look at the unconverted, raw data at all anymore ;)

But I have no hard feelings on adding this, @RemiLehe any opinion on this? If we add it I would make clear that the contained information is non-standardized and should not be "relied" on in any way.

ax3l commented 6 years ago

We decided today that the optional feature is useful and should be called explicitly something like unitRawSymbol.

RemiLehe commented 6 years ago

@DavidSagan I recently had a discussion with @ax3l, and we think that it is better to postpone this issue for now, and maybe reconsider later whether we really need this in the standard.

The reason is that the main use case for this seems to be debugging/informational for one's own file. But it seems that one would rarely look at this attribute for files created by someone else. (In this case, the idea of the standard is that we trust this "someone else" with providing the right unitSI coefficient, so that we would never need to know the units in which the raw data is written, and should always consider the corresponding SI units instead.) Thus, because this attribute is, in our understanding, mainly for one's own debugging purposes, we don't see a benefit in standardizing it: you can always add your own (unstandardized) attributes in the openPMD file, for debugging/informational purposes. In addition, we think that standardizing this attribute could lead to confusions from new users reading the standard ; especially once issue #155 is implemented.

In any case, we would like to at least postpone this issue until a decision is made for #155 ; so that it does not conflict with it...

DavidSagan commented 6 years ago

@ax3l, @RemiLehe: Think of it this way. A person using, say, my Bmad software, will look at data files generated via the Bmad software but if they are not an "expert" they may be a bit fuzzy as to what the units are. If you want, this could be put in the Beam Physics extension since I don't think that anyone will want to use dimensionless units with the Beam Physics extension. Then it could be considered later whether to migrate this to the base standard. And I don't think that if the wording of the standard is well crafted that there will be any confusion.