nexusformat / definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
26 stars 56 forks source link

move NXcanSAS to applications #492

Closed prjemian closed 7 years ago

prjemian commented 7 years ago

refs #420: NXcanSAS was ratified at 2016NIAC

prjemian commented 7 years ago

@zjttoefs : this branch is ready for a PR - should we have canSAS look it over first?

As I look at the 2012 specification document, it shows the SASinstrument as optional.

If canSAS review comes first, we'll need to build the docs separately so they can view the current specification exactly as it exists. Otherwise, the revised docs will be built automatically by the NeXus build server and we can refer there.

prjemian commented 7 years ago

structure of current NXDL NXcanSAS-xture.txt

PDF of current HTML documentation NXcanSAS.pdf

prjemian commented 7 years ago

pull request will wait until the canSAS data formats working group has a chance to review

prjemian commented 7 years ago

Review checklist:

A discussion arose during the NIAC2016 review based on this question: Is this to be a strict minimum information necessary for reduced SAS data or an extension of the 1-D format (which included specifications of other items)? Should NXcanSAS define names for all those terms that describe raw data information? This specifies the name if the value is reported.

We can always revise this NXcanSAS standard and update its version number.

smk78 commented 7 years ago

Review checklist:

Accept that SASinstrument group is optional (which seemed to be our intent all along)

YES

Accept that we want to define SAStransmission_spectrum (which is optional)

YES Transmissions might be a bit of a 'neutron thing', but they are valuable information that does not normally get captured for posterity anywhere else. They are also a useful sanity check and diagnostic tool, however: for example, there have been many occasions when a look at the transmissions of some samples has enabled me to show a user that the reason there analysis was not as they expected was because they were fitting the wrong dataset to the wrong contrast!

I have absolutely no problem with SAStransmission_spectrum being optional, but we should define it. In doing so, however, we need to define how monochromatic sources should record a transmission value: a spectrum with one entry?

Accept current scope of NXcanSAS

YES

Is this to be a strict minimum information necessary for reduced SAS data or an extension of the 
1-D format (which included specifications of other items)?

THE SECOND We started out defining our own nD format which recognised that in many cases analysis required more insight into the data than just the I's and Q's. Generating a NeXus class was seen as a convenient way to achieve that.

Should NXcanSAS define names for all those terms that describe raw data information?

NO We should only define those terms that a) describe the provenance of the data (eg, date, run, instrument, camera length, wavelength(s)) and, b) clarify exactly what the data we are trying to analyse actually represents (eg, title, illuminated area, thickness, transmission, event times).

prjemian commented 7 years ago

Steve asked: In doing so, however, we need to define how monochromatic sources should record a transmission value: a spectrum with one entry?

/SASentry/SASsample/transmission is the location defined by canSAS for the 1D XML standard. That has been copied into NXcanSAS.

toqduj commented 7 years ago

As for the absolute minimum dataset, I'd argue that an uncertainty estimate at least on the intensity ought to be included, as a value means nothing without an uncertainty. In SAS in particular, as you're all well aware, the magnitude of this uncertainty estimate has a large influence on the amount of information that can (or cannot) be extracted from the data. It will tell you if features are real or artifactual, can be modelled or ignored, and whether conclusions are realistic or overinterpreted.

If we want NXcanSAS is to improve the way we analyse and fit data (as I hope it will), eventually adding confidence to the method itself, the uncertainty is critical information.

prjemian commented 7 years ago

That's the difference between absolute minimum vs. recommended minimum. For some analyses, such as Guinier, uncertainties are not required (absolute minimum) but certainly can be used to improve one's understanding of the result of analysis (recommendation).

The NXcanSAS definition specifies how to provide the uncertainty in both I and Q. Should the uncertainty have some complexity to it (in a hypothetical case, such as coming from multiple, independent origins and each of these must be provided in the hypothetical case), there is a description how to provide such complexity. A specification of how to use such complexity is beyond the scope of the NXcanSAS definition.

toqduj commented 7 years ago

I'd argue quite the opposite: if you are doing any sort of fitting, even if it is "just" Guinier, you need the uncertainties to tell you you aren't just reading tea-leaves. When properly propagated, having the uncertainties in the low-Q region shows you how unreliable that data usually is (as you're subtracting a large background from a large background + tiny signal.

With XRD data, I'd say it's optional as the data is quite rich in information already. With SAS, it's essential.

Speaking of uncertainties, were there any comments on our suggestion to separate the uncertainty into an (absolute) scaling uncertainty and an individual datapoint uncertainty? (i.e.: https://github.com/canSAS-org/canSAS2012/issues/10)

Cheers,

Brian.

prjemian commented 7 years ago

As for uncertainties (as discussed at the Tokai meeting cited above), NeXus has the opinion to "wait for a demonstration of that" before proceeding. They would like to understand better how it works.

prjemian commented 7 years ago

See examples provided on the NeXus wiki: http://wiki.nexusformat.org/2014_axes_and_uncertainties

prjemian commented 7 years ago

Regarding the specification of uncertainties, we should make a change in where we specify the name of the uncertainties. As proposed, there are two attributes (I_uncertainties and Q_uncertainties) on the SASdata group.

data : NXdata
  @I_axes : NX_CHAR
  @I_uncertainties : NX_CHAR
  @Mask_indices : NX_CHAR
  @Q_indices : NX_INT
  @Q_uncertainties : NX_CHAR
  @canSAS_class : NX_CHAR = SASdata
  @signal : NX_CHAR = I
  I : NX_NUMBER
  Idev : NX_NUMBER
  Q : NX_NUMBER
    @resolution
  Qdev : NX_NUMBER
  Qmean : NX_NUMBER
  ShadowFactor
  dQl : NX_NUMBER
  dQw : NX_NUMBER

To be consistent with the way these are described in the NeXus NXdata base class, these should be moved to become attributes of the I and Q fields, respectively.

data : NXdata
  @I_axes : NX_CHAR
  @Mask_indices : NX_CHAR
  @Q_indices : NX_INT
  @canSAS_class : NX_CHAR = SASdata
  @signal : NX_CHAR = I
  I : NX_NUMBER
    @uncertainties : NX_CHAR
  Idev : NX_NUMBER
  Q : NX_NUMBER
    @resolution
    @uncertainties : NX_CHAR
  Qdev : NX_NUMBER
  Qmean : NX_NUMBER
  ShadowFactor
  dQl : NX_NUMBER
  dQw : NX_NUMBER

see: http://download.nexusformat.org/doc/html/classes/base_classes/NXdata.html#nxdata and look for "@uncertainties"

prjemian commented 7 years ago

revised NXcanSAS structure: NXcanSAS-xture.txt

prjemian commented 7 years ago

The @uncertainties attribute could be used with any data field.

prjemian commented 7 years ago

canSAS has not specified how to encode a mask, if one is specified. The presence of a mask in a reduced data file was discussed at canSAS2012.

NeXus already has a term called pixel_mask in the NXdetector definition (and NXmx). I suggest that canSAS the same 32-bit integer (contains a bit field for each pixel to signal dead, blind or high or otherwise unwanted or undesirable pixels) for NXcanSAS. See the NXdetector documentation for the specifics.

Look for "pixel_mask" on this page: http://download.nexusformat.org/doc/html/classes/base_classes/NXdetector.html?highlight=mask

toqduj commented 7 years ago

Thanks for the info on the uncertainties. We'll have to discuss the exact location and definition of the uncertainty contributions.

I think adhering to the existing pixel_mask definitions makes sense. It's a bit overkill for our purpose, but we had better avoid redefining things if at all possible.

B.

On Thu, Nov 3, 2016 at 4:32 AM, Pete R Jemian notifications@github.com wrote:

canSAS has not specified how to encode a mask, if one is specified. The presence of a mask in a reduced data file was discussed at canSAS2012.

NeXus already has a term called pixel_mask in the NXdetector definition (and NXmx). I suggest that canSAS the same 32-bit integer (contains a bit field for each pixel to signal dead, blind or high or otherwise unwanted or undesirable pixels) for NXcanSAS. See the NXdetector documentation for the specifics.

Look for "pixel_mask" on this page: http://download.nexusformat. org/doc/html/classes/base_classes/NXdetector.html?highlight=mask

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-258058103, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoybA3ngLtwwmtnfGhGVUGMkVQtBGvks5q6VXZgaJpZM4KXDVH .

toqduj commented 7 years ago

Coming back to this with a little more time to think about it, we were hoping to distinguish between the scaling uncertainty and the (inter-)datapoint uncertainty.

The scaling uncertainties describes the relative uncertainty of the entire axis (1 standard deviation, in fraction of the axis value). For the intensity axis in absolute units, for example, this relates to the final uncertainty in the volume fraction. I suggest two ways of storing this:

  1. set a scaling field:
data : NXdata
  @I_axes : NX_CHAR
  @Mask_indices : NX_CHAR
  @Q_indices : NX_INT
  @canSAS_class : NX_CHAR = SASdata
  @signal : NX_CHAR = I
  I : NX_NUMBER
    @uncertainties : NX_CHAR
    @scaling_uncertainty : NX_NUMBER
  Idev : NX_NUMBER
  Q : NX_NUMBER
    @resolution
    @uncertainties : NX_CHAR
    @scaling_uncertainty : NX_NUMBER
  Qdev : NX_NUMBER
  Qmean : NX_NUMBER
  ShadowFactor
  dQl : NX_NUMBER
  dQw : NX_NUMBER
  1. use the NeXus "variable_errors" field, which maybe does what we need (but I think I'm wrong, it needs to be length n?). In this case, and if I understand correctly, we would add to the example above the following field:
data : NXdata
  @I_axes : NX_CHAR
  @Mask_indices : NX_CHAR
  @Q_indices : NX_INT
  @canSAS_class : NX_CHAR = SASdata
  @signal : NX_CHAR = I
  I : NX_NUMBER
    @uncertainties : NX_CHAR
  I_errors: NX_NUMBER
  Idev : NX_NUMBER
  Q : NX_NUMBER
    @resolution
    @uncertainties : NX_CHAR
  Qdev : NX_NUMBER
  Qmean : NX_NUMBER
  ShadowFactor
  dQl : NX_NUMBER
  dQw : NX_NUMBER
prjemian commented 7 years ago

The variable_errors field is synonymous with the field described in the uncertainties attribute. The former is the way that NeXus has described the uncertainties ("errors") for years. The attribute describes a more flexible method to associate uncertainties and is the proposal in front of the NeXus NIAC at this time. The advantage is that it gives flexibility in the naming method for the uncertainties. It will be confusing to provide both I@uncertainties="Idev" and I_errors that mean different things.

The scaling_uncertainty is the uncertainty of the scaling factor that has been (? or will be?) applied to the dataset. The NXdata base class defines a _scalingfactor field that is just this term. Perhaps the approach that represents the first case you describe preserves the NeXus structure (we create the new attribute scaling_factor that could be used by any dataset):

I : NX_NUMBER
    @uncertainties="Idev" : NX_CHAR
    @scaling_factor="I_scaling" : NX_CHAR
Idev : NX_NUMBER
I_scaling : NX_NUMBER
    @uncertainties="I_scaling_dev" : NX_CHAR
I_scaling_dev : NX_NUMBER
Q : NX_NUMBER
    @uncertainties="Qdev" : NX_CHAR
    @scaling_factor="Q_scaling" : NX_CHAR
Qdev : NX_NUMBER
Q_scaling : NX_NUMBER
    @uncertainties="Q_scaling_dev" : NX_CHAR
Q_scaling_dev : NX_NUMBER

Both I_scaling and Q_scaling are scalars when I(Q) is 1-D with unity value. (This is the prevailing assumption when no such information is defined.) The related uncertainties values are also scalar and contain the interesting values.

The scaling uncertainties assert no influence on the information content of a single I(Q) or its analysis. The scaling uncertainty becomes relevant when comparing several I(Q) measurements. Defining this term separately makes clearer that uncertainty in the scaling factor is not the uncertainties term used for a single I(Q) analysis.

toqduj commented 7 years ago

That suggestion looks very appropriate to me. We can also require that the scaling factor values are supposed to be 1, in which case the uncertainty is (by definition) relative and normalised.

Is there a particular reason to call the scaling uncertainties "I_scaling_dev", the intensity uncertainties "Idev"? The one is using underscores, the other not. It looks like naming consistency can be attained with a minimum of changes by using an underscore in "I_dev" and "Q_dev". I understand this is arbitrary, since the naming of the uncertainties is pointed to in the "uncertainties" attribute of the corresponding dataset, but it would be good to show we care about consistent naming.

By the by, is the name of "uncertainties" (plural) defined by NeXus, or can it be the more general "uncertainty" (valid for both single as well as multiple values)?

B.

On Mon, Nov 7, 2016 at 5:27 AM, Pete R Jemian notifications@github.com wrote:

The variable_errors field is synonymous with the field described in the uncertainties attribute. The former is the way that NeXus has described the uncertainties ("errors") for years. The attribute describes a more flexible method to associate uncertainties and is the proposal in front of the NeXus NIAC at this time. The advantage is that it gives flexibility in the naming method for the uncertainties. It will be confusing to provide both I@uncertainties="Idev" and I_errors that mean different things.

The scaling_uncertainty is the uncertainty of the scaling factor that has been (? or will be?) applied to the dataset. The NXdata base class defines a _scalingfactor field that is just this term. Perhaps the approach that represents the first case you describe preserves the NeXus structure (we create the new attribute scaling_factor that could be used by any dataset):

I : NX_NUMBER @uncertainties="Idev" : NX_CHAR @scaling_factor="I_scaling" : NX_CHAR Idev : NX_NUMBER I_scaling : NX_NUMBER @uncertainties="I_scaling_dev" : NX_CHAR I_scaling_dev : NX_NUMBER Q : NX_NUMBER @uncertainties="Qdev" : NX_CHAR @scaling_factor="Q_scaling" : NX_CHAR Qdev : NX_NUMBER Q_scaling : NX_NUMBER @uncertainties="Q_scaling_dev" : NX_CHAR Q_scaling_dev : NX_NUMBER

Both I_scaling and Q_scaling are scalars when I(Q) is 1-D with unity value. (This is the prevailing assumption when no such information is defined.) The related uncertainties values are also scalar and contain the interesting values.

The scaling uncertainties assert no influence on the information content of a single I(Q) or its analysis. The scaling uncertainty becomes relevant when comparing several I(Q) measurements. Defining this term separately makes clearer that uncertainty in the scaling factor is not the uncertainties term used for a single I(Q) analysis.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-258744803, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoyWy7gNUmAcox6DuQUOYjC0KDLGeqks5q7qiWgaJpZM4KXDVH .

prjemian commented 7 years ago

Let's default the scaling factor values to 1, leaving open the possible use of a non-unity value.

The canSAS proposal to NeXus regarding the uncertainties was for the uncertainty attribute. Somewhere in the process, this became a plural in its NeXus implementation and is now added in a couple other places. I'll post the question to the NeXus tech teleconference for next week to see if we can change uncertainties to uncertainty.

The use of the NeXus uncertainties attribute allows, in the general case, the user to choose the name of the field (a.k.a. dataset) that holds this information. In the canSAS 1D standard, the consensus choice for this was Idev for the intensity uncertainty and Qdev for the Q uncertainty. Idev is optional but, when given, is the the name to be used. This choice from canSAS predates the flexibility of the uncertainty attribute. That is,

I : NX_NUMBER
  @uncertainties="Idev" : NX_CHAR
Idev : NX_NUMBER

The Idev and Qdev terms have been carried forward to NXcanSAS as the names canSAS chooses should this information be provided. It is an inconsistent pattern but an obvious signature of a committee decision. Some analysis code already expects Idev and its change would have some impact.

As for I_scaling, I picked that name (rather than I_scaling_factor) for clarity and brevity but also to be consistent with the NeXus pattern of VARIABLE_TERM (such as _Ierrors) and then added _dev. I suggest that canSAS not define either name, I_scaling or I_scaling_dev, thus leaving the user that flexibility by specifying the relevant field names through the attributes.

toqduj commented 7 years ago

Alright, let's go with that then.

On Tue, Nov 8, 2016 at 7:05 PM, Pete R Jemian notifications@github.com wrote:

Let's default the scaling factor values to 1, leaving open the possible use of a non-unity value.

The canSAS proposal to NeXus regarding the uncertainties was for the uncertainty attribute. Somewhere in the process, this became a plural in its NeXus implementation and is now added in a couple other places. I'll post the question to the NeXus tech teleconference for next week to see if we can change uncertainties to uncertainty.

The use of the NeXus uncertainties attribute allows, in the general case, the user to choose the name of the field (a.k.a. dataset) that holds this information. In the canSAS 1D standard, the consensus choice for this was Idev for the intensity uncertainty and Qdev for the Q uncertainty. Idev is optional but, when given, is the the name to be used. This choice from canSAS predates the flexibility of the uncertainty attribute. That is,

I : NX_NUMBER @uncertainties="Idev" : NX_CHAR Idev : NX_NUMBER

The Idev and Qdev terms have been carried forward to NXcanSAS as the names canSAS chooses should this information be provided. It is an inconsistent pattern but an obvious signature of a committee decision. Some analysis code already expects Idev and its change would have some impact.

As for I_scaling, I picked that name (rather than I_scaling_factor) for clarity and brevity but also to be consistent with the NeXus pattern of VARIABLE_TERM (such as _Ierrors) and then added _dev.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-259212487, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoyWw8-m_HCqgn7nTBEMRnxM6hEhRlks5q8LoFgaJpZM4KXDVH .

toqduj commented 7 years ago

Quick return: I'm programming an NXcanSAS 1D data reader for McSAS. Some of the definitions in the specifications appear to be too optional if I understand correctly.

In particular the attributes that point towards the data-containing items, are they required to be present or not? As far as I can see from http://cansas-org.github.io/canSAS2012/framework.html, they are optional, which makes programming a reader a bit bloated, as for every data item, I need to:

  1. check if there is an axes or signal attribute, if there is:
  2. follow that attribute, maybe?
  3. if there isn't, see if there is a default named item: "I", "Q", "Idev"
  4. if there isn't, crash gently

The examples in https://github.com/canSAS-org/NXcanSAS_examples/tree/master/1d_standard all seem to have a "signal" and "axes" attribute. Should those be followed?

I guess what I'm asking is: what is the official methodology for reading the data from NXcanSAS?

prjemian commented 7 years ago

Those are part of the NeXus standard. See the documentation for NXdata.

  1. signal attribute is required
  2. axes attribute is strongly recommended - for NXcanSAS, its value is required to be "I", no need to follow this for your reader.
  3. For I and Q uncertainty, the uncertainties attribute is to be used - for NXcanSAS. Your reader should follow these since we have agreed to some flexibility in how uncertainties are described. If the attributes are not present, assume no uncertainty is defined.

In all cases of these attributes, the defined the name of a field which must be present. Present an error message if not (and handle as gracefully as you wish; presenting a dialog for the user to correct the value might be gracious).

Pete

On Nov 20, 2016 7:47 AM, "Brian R. Pauw" notifications@github.com wrote:

Quick return: I'm programming an NXcanSAS 1D data reader for McSAS. Some of the definitions in the specifications appear to be too optional if I understand correctly.

In particular the attributes that point towards the data-containing items, are they required to be present or not? As far as I can see from http://cansas-org.github.io/canSAS2012/framework.html, they are optional, which makes programming a reader a bit bloated, as for every data item, I need to:

  1. check if there is an axes or signal attribute, if there is:
  2. follow that attribute, maybe?
  3. if there isn't, see if there is a default named item: "I", "Q", "Idev"
  4. if there isn't, crash gently

The examples in https://github.com/canSAS-org/ NXcanSAS_examples/tree/master/1d_standard all seem to have a "signal" and "axes" attribute. Should those be followed?

I guess what I'm asking is: what is the official methodology for reading the data from NXcanSAS?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-261779360, or mute the thread https://github.com/notifications/unsubscribe-auth/ACLKMNtN0jRBtwyRZaDeEDa8JPZo0XkLks5rAE-EgaJpZM4KXDVH .

prjemian commented 7 years ago

correction:

  1. signal attribute is required by NeXus - for NXcanSAS, its value is required to be "I", no need to follow this for your reader.
  2. "I_axes" attribute is necessary when I(Q) may also depend on other fields such as time. See the NXcanSAS documentation for the full description. Examples 13 & 14 of https://github.com/canSAS-org/NXcanSAS_examples/tree/master/canSAS2012_examples illustrate a case where this value becomes non-trivial.

other points remain as-is

NXdata - http://download.nexusformat.org/doc/html/classes/base_classes/NXdata.html

toqduj commented 7 years ago

Hi Pete,

Thanks for the quick answer. I wasn't sure whether there was a priority in the way the cases should be handled, but I think it's reasonably clear. So I_axes may or may not be there, as I understand from your answer. The "uncertainties" attribute is required in NXcanSAS when uncertainties are provided, right?

Is there a foolproof test of testing for 1D or 2D data? A simple dimensionality test won't do, as it can be 1D time series, and just checking for the existence of a "Q" field also doesn't help, since 2D data can be "Q, Q" or "Qx, Qy"...

Cheers,

Brian.

On Sun, Nov 20, 2016 at 4:58 PM, Pete R Jemian notifications@github.com wrote:

correction:

  1. signal attribute is required by NeXus - for NXcanSAS, its value is required to be "I", no need to follow this for your reader.
  2. "I_axes" attribute is necessary when I(Q) may also depend on other fields such as time. See the NXcanSAS documentation for the full description. Examples 13 & 14 of https://github.com/canSAS-org/ NXcanSAS_examples/tree/master/canSAS2012_examples https://github.com/canSAS-org/NXcanSAS_examples/tree/master/canSAS2012_examples illustrate a case where this value becomes non-trivial.

other points remain as-is

NXdata - http://download.nexusformat.org/doc/html/classes/base_ classes/NXdata.html

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-261786449, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoyUBNuhEBlgCnqKDcuqYxfkkzyYfpks5rAG4XgaJpZM4KXDVH .

prjemian commented 7 years ago

Assume you are differentiating between I(|Q|,t) and I(Qx,Qy)? The I_axes attribute will provide this information without ambiguity, defining each dimension of I.

prjemian commented 7 years ago

Posted a note to https://groups.google.com/forum/#!topic/cansas-dfwg/YaWRwPm_zdY advising this will be presented as a pull request on/about Dec 1 unless there are any significant and unresolved objections.

butlerpd commented 7 years ago

Sorry for coming late to the discussion but reading through the chain I do not see is any discussion about Q resolution. I may have missed it but how is resolution information provided? And do we just allow the width of a Gaussian or are we allowing for non Gaussian resolution functions to be described as discussed I believe by Charles Dewhurst?

toqduj commented 7 years ago

The Q resolution's not something I had thought about at that point in time (I have since, see http://www.lookingatnothing.com/index.php/archives/2258 ).

To keep the Q uncertainty model-agnostic, however, we could specify it not to be the width of a gaussian, but one standard deviation of any function used to approximate the resolution function. Fortunately, this equates to the width of the gaussian distribution if a Gaussian is chosen. An attribute may be provided to specify which function is used for the approximation of the resolution function.

On Thu, Nov 24, 2016 at 4:43 PM, Paul Butler notifications@github.com wrote:

Sorry for coming late to the discussion but reading through the chain I do not see is any discussion about Q resolution. I may have missed it but how is resolution information provided? And do we just allow the width of a Gaussian or are we allowing for non Gaussian resolution functions to be described as discussed I believe by Charles Dewhurst?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-262803790, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoydRIg8Yh96rOXYRHNti-O9ieXuPNks5rBbCbgaJpZM4KXDVH .

prjemian commented 7 years ago

See this comment: https://github.com/nexusformat/definitions/issues/492#issuecomment-257639247

Here is the definition:

<attribute name="resolution" type="NX_CHAR" >
  <doc>
    (optional) 
    Generally, this is the principal resolution of each :math:`Q`.
    Names the data object (in this SASdata group) that provides the 
    :math:`Q` resolution to be used for data analysis.  Such as::

        @resolution="Qres"

    The name of the dataset containing the :math:`Q` resolution
    is flexible.  The name must be unique in the *SASdata* group.

    .. comment
       see: https://github.com/canSAS-org/canSAS2012/issues/7

       There may also be a subdirectory (optional) with constituent 
       components, similar to the handling of complex uncertainties.

       This pattern will demonstrate how to introduce further as-yet 
       unanticipated terms related to the data.

  </doc>
</attribute>

Interpretation of the resolution attribute is not prescribed at this time. It is possible to introduce a term (or terms) to assist this. Such as the ad hoc attribute resolution_description="Gaussian". The NeXus application definitions are now individually versioned. This can be added now if we can agree on the implementation to a later version if it requires some discussion. The only realistic NeXus limitations are not to use a name already in use and not to do differently than other implementations of the same thing.

butlerpd commented 7 years ago

Humm.... interesting point about shapes of the Probability function for the uncertainties .... which is particularly relevant for counting statistics at very low counts. However that is a separate issue from the resolution question since uncertainty standard deviations represents the range of values that encompass the true value with a 68% certainty level while the resolution represents 100% certainty that the measured value includes 100% known fractions of contribution from several true values. Both can exist though in SANS we generally only treat resolution. The uncertainties are propagated through all the computations according to well known standard propagation of uncertainties and impacts the uncertainty of the derived parameters from a fit with large uncertainties on data providing larger uncertainties in derived parameters. Resolution is of course not propagated at all but is normally treated by convoluting the distribution function with a model when fitting and should not impact the uncertainty in the derived parameters (in fact fitting the 100% correct model to data with zero uncertainties in I but a broad resolution should yield parameters with 0 uncertainties I believe ... assuming of course zero correlation between any of the parameters :-)

There are two approaches to handling this. The current approach is for the software writer to have to assume when a data set associated with a set of values is an uncertainty which needs to be statistically treated and when in fact it is not that and should instead be treated as a resolution distribution. The possible approach other of course is to define clearly exactly what is meant by each set of data associated with a value (is it resolution, uncertainty, or something else). I would actually argue that there is uncertainty in Q calibrations which may usually be small but should i fact be treated properly in the uncertainty estimates of any derived parameter independently of resolution smearing -- but that of course opens a whole other can of worms (what do you mean you don't know your Q? :-)

Anyway seems that if we are trying to define everything including the kitchen sink we could define this as well and thus ahem remove the uncertainty when guessing what we are dealing with :-)

That said if we are not including resolution separately from uncertainty at this point I would argue that the discussion should be move to a future enhancement rather than hold up the first standard from being published? So is that a fair statement? We are not currently specifying resolution separately?

prjemian commented 7 years ago

messages have crossed in preparation:

Paul wrote:

if we are not including resolution separately from uncertainty 

But, we are allowing resolution to be specified separate from uncertainty. Details about either can be added in an additional attribute or a subgroup if the details become complicated.

toqduj commented 7 years ago

Yes, sorry, I was mixing the two again. That's because they're inherently linked. So yes, according to the definition we can point to @uncertainty in the Q dataset. This is separate from the broadening, pointed to by the @resolution attribute.

The resolution definition can be specified, I'd stick initially with one standard deviation (which defines the Gaussian width if the resolution function is Gaussian, but can also be used with an alternative single-parameter resolution function).

On Thu, Nov 24, 2016 at 6:10 PM, Pete R Jemian notifications@github.com wrote:

messages have crossed in preparation:

Paul wrote:

if we are not including resolution separately from uncertainty

But, we are allowing resolution to be specified separate from uncertainty. Details about either can be added in an additional attribute or a subgroup if the details become complicated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-262819588, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoycmfcyIXhyP7-1IhKpn4OutmnVe7ks5rBcUAgaJpZM4KXDVH .

toqduj commented 7 years ago

P.S. this way, we'd consistently use standard deviations throughout the definition.

On Thu, Nov 24, 2016 at 6:14 PM, Brian P brianpauw@gmail.com wrote:

Yes, sorry, I was mixing the two again. That's because they're inherently linked. So yes, according to the definition we can point to @uncertainty in the Q dataset. This is separate from the broadening, pointed to by the @resolution attribute.

The resolution definition can be specified, I'd stick initially with one standard deviation (which defines the Gaussian width if the resolution function is Gaussian, but can also be used with an alternative single-parameter resolution function).

On Thu, Nov 24, 2016 at 6:10 PM, Pete R Jemian notifications@github.com wrote:

messages have crossed in preparation:

Paul wrote:

if we are not including resolution separately from uncertainty

But, we are allowing resolution to be specified separate from uncertainty. Details about either can be added in an additional attribute or a subgroup if the details become complicated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nexusformat/definitions/issues/492#issuecomment-262819588, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMoycmfcyIXhyP7-1IhKpn4OutmnVe7ks5rBcUAgaJpZM4KXDVH .

butlerpd commented 7 years ago

yep ... see that. sounds good. Was confused I think by the fact that the term "uncertainty" seems to have been used for both uncertainly and resolution in the all the discussions - sorry about that.

butlerpd commented 7 years ago

Thanks for clarification Brian -- will read your post more carefully :-) I agree that we should stick to 1 std dev as the starting point. The fact that we can in principle describe more complex functions already in this definition (if I understand Pete correctly) is a bonus

Thanks both for straightening me out

Paul

prjemian commented 7 years ago

The da16b6a commit adds the resolution_description attribute, describing the assumption that "Gaussian" is the meaning of resolution. In cases of alternate meanings, more detail might supplement this resolution_description attribute by addition of datasets or a :ref:NXnote subgroup to include that detail.

ajj commented 7 years ago

I'm now a bit lost here.

Whilst Idev is an uncertainty (true error bar, 1 standard deviation as default), Qdev, dQL and dQw are not uncertainties, they are values that provide parameters to a function that must be applied to get a model function to match the data.

At the meeting in Tokai we had noted that @uncertainties is not a good name if we are folding in resolutions - hence I note that as per canSAS-org/canSAS2012#7 @resolution now exists, but the documentation then also needs updating so that the examples don't show Qdev, dQL and dQw being used with @uncertainties.

I also note that in the documentation we have @I_uncertainties and @Q_uncertainties defined as attributes of SASData. This seems redundant - shouldn't we be using the @uncertainties attribute on I and Q respectively? In fact the documentation is self-contradictory on this - in the @Q_uncertainties description it refers to pointing to Qdev, whereas the description of Qdev states that it must have been pointed to by the @uncertainties attribute on Q.

If you have a true uncertainty in Q (calibration error for instance) that should be recorded using the @uncertainies attribute. Whether that should be applied before or after resolution effects is an interesting question, but is it one for the data format?

We had discussed how we go about representing multi-component uncertainties or resolution. For example, in Uppsala we proposed:

   I : float[nI]
      @uncertainty="dI"
   dI : float[nI]
      @components="I_uncertainties"
   I_uncertainties:
      electronic : float[nI]
         @basis="Johnson noise"
      counting_statistics: float[nI]
         @basis="shot noise"
      secondary_standard: float[nI]
         @basis="esd"

Is this still possible? One could imagine defining components for resolution also and providing their basis. Can we handle complex resolutions (as per Dewhurst) in future without breaking backward compatibility?

ajj commented 7 years ago

As per my email to the DFWG, it looks to me as if the nxdl definition of NXCansas in the repo does not match the examples in the cansas2012 repo.

The examples use @uncertainties attributes on I and Q, but the nxdl does not define @uncertainties as an attribute of I and Q - it defines @I_uncertainties and @Q_uncertainties on SASData.

Surely we want the former, not the latter?

Edit: Is @uncertainties inherited from Nexus? If so then we can just remove the @I_uncertainties and @Q_uncertainties definitions.

ajj commented 7 years ago

Looking back through the mists of time ... In Uppsala we were happy with using @uncertainties to include references to resolution function parameters. This was because we were using a dictionary of terms that were judged to be meaningful to the community (Qdev, dQw, dQl) in part because they were defined for the 1D xml format.

However, since then, and particularly at the Tokai meeting, it became clear that 'uncertainties' holds the potential for confusion when resolution is also involved. Hence the introduction of @resolution to separate resolution components from true uncertainties (calibration errors for example).

Thus, since @resolution exists we should now define it as the correct place to refer to Qdev, dQw and dQl. The nxdl should be updated to reflect this, as should the examples. Otherwise, readers will have to check in both @uncertainties and @resolution for the special names (Qdev etc) in order to find the resolution components.

Edit: The alternative is to very carefully define the terms and leave Qdev, dQw and dQl pointed to by @uncertainties. However, we then need to document Qdev, dQw and dQl clearly to indicate that they are resolution terms. @resolution is then reserved "non-standard" complex resolution terms that aren't "pinhole" or "slit" smearing of monochromatic data (or TOF/polychromatic data where an approximation to pinhole or slit smearing is deemed acceptable). In essence with dQw and dQl we are providing input parameters to "well known"/standard algorithms for resolution smearing.

The disadvantage here is a bit more complexity for readers and the vocabulary being a bit messier.

prjemian commented 7 years ago

Discussion about Q resolution v. Q uncertainty has continued on the Google Group page: https://groups.google.com/forum/#!topic/cansas-dfwg/YaWRwPm_zdY

Summary is that Qdev, dQw, and dQl are intended to be resolution (not uncertainty).

Also, it was noticed that the PDF needs to be updated to reflect the current version of the NXcanSAS.nxdl.xml file.

prjemian commented 7 years ago

Updated documentation (accompanying change set will be committed next): NXcanSAS.pdf

prjemian commented 7 years ago

updated example files:

canSAS1d/1.1 XML files converted to NXcanSAS: https://github.com/canSAS-org/NXcanSAS_examples/tree/master/1d_standard canSAS2012 examples written in NXcanSAS: https://github.com/canSAS-org/NXcanSAS_examples/tree/master/canSAS2012_examples

prjemian commented 7 years ago

Commit d79a8ea removes SASdata/Q/@uncertainties from the specification per today's discussion

updated documentation: NXcanSAS.pdf

prjemian commented 7 years ago

pull request submitted: #516