nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 57 forks source link

Rewrite the NXdata scaling_factor and offset fields #1333

Closed phyy-nx closed 8 months ago

phyy-nx commented 10 months ago

Closes #1332

Also adds clarification in NXmx that these fields can be used as pedestal and gain correction fields, as well as elaborates on the possible rank options. These rank options were implied (in my opinion) in the original wording, but in NXmx I made it more explicit.

Tagging @biochem-fan for reference

phyy-nx commented 10 months ago

I purport this doesn't need a vote as it doesn't change functionality, and the change to NXmx only clarifies functionality that was already there. Feedback welcome!

biochem-fan commented 10 months ago

@phyy-nx

The elements in data are usually float values really. For efficiency reasons these are usually stored as integers after scaling with a scale factor. This value is the scale factor. It is required to get the actual physical value, when necessary.

The original description for the gain sounds like stored_integer_value = physical_value * the_scale_factor, making the definition of the gain the same as in MX (i.e. value per photon). But your new description is the other way round. Did you check the interpretation at the NIAC?

I know you confirmed the order of the gain and the offset, but I don't know if the definition of the gain was confirmed. This post is just to double check it.

phyy-nx commented 10 months ago

Ack, I think you might be right and I misinterpreted this field!

The elements in data are usually float values really. For efficiency reasons these are usually stored as integers after scaling with a scale factor. This value is the scale factor. It is required to get the actual physical value, when necessary.

So I think I now agree this is saying the stored value has already had the scaling factor applied. But that reads differently from offset:

An optional offset to apply to the values in data.

That implies the offset hasn't been implied yet!

So, @nexusformat/developers, how do you use scaling_factor and offset in NXdata, if at all?

phyy-nx commented 10 months ago

Bump! @nexusformat/developers :)

How do you use scaling_factor and offset in NXdata, if at all?

PeterC-DLS commented 9 months ago

I don't use these fields but it seems that if you want to document the stored values then stored_value = (physical_value - offset) * scaling_factor may be best for the pedestal use but other people want to use offset as a bias: stored_value = physical_value * scaling_factor + offset. So two more interpretations to choose from!

benajamin commented 9 months ago

I have always interpreted these fields as things that need to be done to the stored value in order to get to the physical value. I would do this as: physical value = stored value * scaling_factor + offset

woutdenolf commented 9 months ago

I second https://github.com/nexusformat/definitions/issues/1333#issuecomment-1864730138 by @benajamin with the exception that I always think about stored value vs. plotted value (meaning the coordinate in the plotting coordinate system) since NXdata is meant to represent "data to be plotted"

plotted value = stored value * scaling_factor + offset
phyy-nx commented 9 months ago

Ok I like switching the discussion to "stored" vs. "plotted". Based on that, just to recap without math, there are two general ways these can be interpreted:

  1. You, the person plotting the data, must apply scaling_factor and offset before you plot the data
  2. I, the data preparer, have already applied scaling_factor and offset for you, and I'm noting the values I used

I hope the answer is 1. to match @biochem-fan's use case, but it seems to line up with some of the comments above.

I do note that both @benajamin and @woutdenolf switched the order of operations to match the "bias" version that @PeterC-DLS suggested, compared to the pedestal version that I originally suggested:

a. plotted_value = physical_value * scaling_factor + offset b. plotted_value = (physical_value + offset) * scaling_factor

a. makes more intuitive sense to me as it more closely matches the equation of a line y=mx+b. But the consequence is that for our MX data that we need to store "gain-corrected" pedestals... But that's not relevant for @biochem-fan's use case (no pedestal I believe), so maybe a. is fine?

phyy-nx commented 9 months ago

I've made the change to use the "plotted" nomenclature for NXdata, but I've kept the pedestal formula for now, until we get a bit more discussion here (plotted_value = (physical_value + offset) * scaling_factor). Mainly because it's I'm coming from an MX background where pedestal is applied first in the use cases I know about. But I can be convinced otherwise! Especially if folks already have code with the other convention (plotted_value = physical_value * scaling_factor + offset)

PeterC-DLS commented 9 months ago

LGTM. For clarity's sake in your comments, I think stored_value is better than physical_value as you may be plotting the "physical" value.

phyy-nx commented 8 months ago

Feedback from Telco addressed. If there's no more comments we'll send this to a vote.

phyy-nx commented 8 months ago

Hello, please vote by providing an emoji on this comment. Thanks.

prjemian commented 8 months ago

Not happy with the term pedestal in this context. Is it the same as offset?

phyy-nx commented 8 months ago

@prjemian pedestal, used only for NXmx, does mean offset, and I assumed it was a known term in x-ray diffraction land. I could be convinced that gain and pedestal should be defined more clearly, but that could be done in a separate, not-voted on PR, since it would only be a documentation clarification.

prjemian commented 8 months ago

Since offset is the term that has been used with NXdata, let's not change it in this PR.

prjemian commented 8 months ago

image

phyy-nx commented 8 months ago

Wait it's not being changed in the PR. For NXdata it's FIELDNAME_scaling_factor and FIELDNAME_offset, and for NXmx is data_scaling_factor and data_offset.

Pedestal is just a documentation term. Does that help?

biochem-fan commented 8 months ago

Depending on the field, various terms are used: "offset", "pedestal" and "bias". For example, ThermoFisher's electron detectors use "bias".

Perhaps keeping "offset" for NXdata is a good idea but I would prefer to have other term mentioned as well in the NXmx documentation for better searchability.

PeterC-DLS commented 8 months ago

Should reserved suffixes be updated too?

prjemian commented 8 months ago

Yes

On Wed, Jan 31, 2024, 3:35 AM Peter Chang @.***> wrote:

Should reserved suffixes https://manual.nexusformat.org/datarules.html#reserved-suffixes be updated too?

— Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/pull/1333#issuecomment-1918726950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMAQXZF42W4M7HS25JLYRIF6LAVCNFSM6AAAAAA7VKHPNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJYG4ZDMOJVGA . You are receiving this because you were mentioned.Message ID: @.***>

phyy-nx commented 8 months ago

Happy to modify reserved suffixes in a different PR after this vote.

paulmillar commented 8 months ago

LGTM :smile:

phyy-nx commented 8 months ago

Vote did not pass (got 12 votes, needed 13 for quorum), which is fine given all the discussion. For the sake of clarity I'm closing this PR, removing it and the associated issue from the milestone, and I will make a new PR that addresses the feedback here. We'll try again then.