mantidproject / mantid

Main repository for Mantid code
https://www.mantidproject.org
GNU General Public License v3.0
211 stars 124 forks source link

New YAML output format for SaveReflectometryASCII #31073

Closed gemmaguest closed 11 months ago

gemmaguest commented 3 years ago

Original reporter: Max at ISIS

ISIS Reflectometry have a new YAML based ASCII format they would like to be able to save reduced data to. We already support multiple ASCII format using the SaveReflectometryASCII algorithm, so this would be added as another option to that algorithm, and exposed on the Save ASCII tab on the ISIS Reflectometry interface.

The format is simply something like the following (see also https://www.reflectometry.org/projects/file_formats/examples):

# creator:
#   name        : Jochen Stahn
#   affiliation : PSI
#   time        : 2020/04/06/13:21:18
# data source:
#   experiment:
#     probe              : neutrons
#     sample:
#       name             : Ni1000
# reduction:
#   software: eos.py
#     call : eos.py -Y 2020 -n 1925-1927 -y 9,55 ni1000 -O -0.2 -r 1064 -s 1 -i -a 0.005 -e
# data:
#   column 1 : Qz / Aa^-1
#   column 2 : RQz
#   column 3 : sigma RQz , standard deviation
#            1               2               3
1.03563296e-02  3.88100068e+00  4.33909068e+00
1.06717294e-02  1.16430511e+01  8.89252719e+00

The new file format would have extension .ort. This file format is still at an early stage of adoption - the idea of implementing it in Mantid is to start getting people to use it, but it's likely to be a while before it becomes mainstream.

From Max:

Notes

DavidFair commented 3 years ago

@gemmaguest I've untagged this from 6.2, but I thought I'd ping you if you want it to target 6.3?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in 6 months. It will be closed in 7 days if no further activity occurs. Allowing issues to close as stale helps us filter out issues which can wait for future development time. All issues closed by stale bot act like normal issues; they can be searched for, commented on or reopened at any point. If you'd like a closed stale issue to be considered, feel free to either re-open the issue directly or contact a developer. To extend the lifetime of an issue please comment below, it helps us see that this is still affecting you and you want it fixed in the near-future. Extending the lifetime of an issue may cause the development team to prioritise it over other issues, which may be closed as stale instead.

stale[bot] commented 2 years ago

This issue has been closed automatically. If this still affects you please re-open this issue with a comment or contact us so we can look into resolving it.

gemmaguest commented 1 year ago

It would be good if we could add support at least a minimal YAML output format as soon as possible so that ISIS can shape development of these output formats

stale[bot] commented 1 year ago

This issue has been closed automatically. If this still affects you please re-open this issue with a comment or contact us so we can look into resolving it.

jfkcooper commented 1 year ago

After working on this a little and discussing with Rachel we found that there are several blockers to implementing the full standard. A file saving algorithm (in my view at least) should only need to have access to the workspace which it is trying to save, which means that any metadata which is required by the standard should somehow be in the workspace. There are a reasonable number of entries for which this is true and easily accessible:

There are however a number of required entries which are not so easy to extract from the workspace (though do in principle exist in them). These are all present in the workspace history, but this is not a good place to be getting the values from as it is an unintended use case and would make the algorithm very fragile to changes in the format of the history:

In addition, there are required entries which (again as far as I can tell) do not exist in the mantid workspaces, but do exist in the nexus files from which the data was loaded:

Finally, there are the required entries which do not exist anywhere. These are not saved in the nexus files (and really should be), with the treatment of the subsequent files done solely on what is "known". Note that these require changes on the IBEX side:

After the discussion with Rachel, we believe that the best course of action is to try to proceed without any of the properties listed above except the resolution, which is required by the analysis programs. This would be supplied as an additional input to the algorithm (in addition to the workspace name and output filepath). Since the algorithm is unlikely to be used by itself, we would also try to work on getting the reflectometry GUI to be able to auto-populate this when saving datafiles.

This would put the algorithm in a state where it technically does not conform to the standard, but is still considerably better than not being able to save into the ORSO format at all, and is preferable to having a large amount of hacky code which would need replacing at a later date once the logging issues detailed above have been sorted.

Lastly, there is also the issue of if and how to include orsopy as a dependency. There is some work which may happen on their side to create a condaforge distribution in additoin to their PyPi version, which may help here, though this is not confirmed and still leaves the question of if it should be included.

jfkcooper commented 1 year ago

SaveReflectometryORSO.txt SaveReflectometryORSOTest.txt test_ort_file.txt I attach the first version of the writer, an example output, and unit tests for this. The writer is annotated with pieces which are or may be ISIS specific

rbauststfc commented 1 year ago

For reference when we pick this up, Andrew has flagged this and this section of the NIST NCNR team's bespoke reduction software where they have implemented the ORSO NeXus format. Note this is not the ASCII format, as is the subject of this issue, however it may still be interesting to see what they've done. Andrew is happy to refer any questions about the code to his contact at NIST.

rbauststfc commented 1 year ago

Background info for POLREF from Andrew:

On POLREF, for specular reflectivity (for off-specular the output is a bit different) we basically only save out a 3-column ASCII with no header...the 3 columns are Q, R, dR. We have each individual spin state as a different file.

In future, we (POLREF) would like to go a similar way to the NIST NCNR orso file, where we include all spin states in one file - we would be looking to have 8 columns: Q, R, dR, dQ, lambda, dlambda, incident theta, d incident theta. For POLREF at least, initially we would work with unstitched data, but we might append the stitched data to the end of the file - I haven't worked this out in my head yet.

The various options/data sets are why Reflectometry are so interested in the ORSO Nexus file.

jfkcooper commented 1 year ago

As it stands, I believe only the ascii data format has been accepted by the ORSO organisation, and the HDF/Nexus style is still subject to potentially a reasonable amount of change (depending on if ORSO chooses to got for a Nexus standard, or just using the HDF5 format). This may be relevant in the future, but I would stay away from it for now in mantid.

rbauststfc commented 1 year ago

As it stands, I believe only the ascii data format has been accepted by the ORSO organisation, and the HDF/Nexus style is still subject to potentially a reasonable amount of change (depending on if ORSO chooses to got for a Nexus standard, or just using the HDF5 format). This may be relevant in the future, but I would stay away from it for now in mantid.

Thanks @jfkcooper, this is just capturing information for the future.

maxskoda commented 1 year ago

Hi Rachel,

As discussed, here are some comments regarding the test .ort file you have shared:

  1. We should really change the first line in the header, but it seems this is hard coded in osropy.file.orso: ORSO_VERSION = "1.0" ORSO_DESIGNATE = ( f"# ORSO reflectivity data file | {ORSO_VERSION} standard " "| YAML encoding | https://www.reflectometry.org/" ) I will talk to the orsopy coders about this. In the meantime we could just add a comment in the save_orso method. I think this will show up just below the first line. It could read sth like: "Mantid@ISIS output may not be fully ORSO compliant"

  2. start_date issue: I'm not sure what the timestamp refers to - the beginning of the whole experiment, the beginning of the data set or what. Since the .ort file will most likely contain stitched and normalised data, the start date/time doesn't exactly make sense here. Again, I will check with ORSO.

  3. data_files: I believe this should be a list of the individual angle files, in full format, as already noted

  4. reduction: it would be good to use the "corrections" parameter in orsopy.fileio.reduction.Reduction to indicate at the very least the transmission, calibration and flood files that have been used for the reduction. Ideally one could also include information such as "normalised by monitor integral"

5.a) reduction - timestamp: Yes, some approximate time stamp of when the reduced workspace was created (as opposed to when the .ort file was created). b) creator: orsopy says: "creator (Optional[[Person] https://orsopy.readthedocs.io/en/latest/orsopy.fileio.base.html#orsopy.fileio.base.Person)]) – The person or routine who created the reduced file." I think this could be one or more of: [Mantid->RROA or similar algorithm name, local contact, entire experimental team]

  1. sQz - as note before, this needs to be (slit resolution) (Qz value), e.g.: 0.044.7711570593988312e-03
maxskoda commented 1 year ago

As it stands, I believe only the ascii data format has been accepted by the ORSO organisation, and the HDF/Nexus style is still subject to potentially a reasonable amount of change (depending on if ORSO chooses to got for a Nexus standard, or just using the HDF5 format). This may be relevant in the future, but I would stay away from it for now in mantid.

While I agree that the binary/HDf5 .orb format is still in a state of flux, I think that this the time to start thinking about how all the necessary information can be collected. This could inform the structure of the .orb file and avoid the situation that we find ourselves in now with the .ort format, where we have the orsopy library, but we cannot fully use it due to missing meta-information.

rbauststfc commented 1 year ago

Summary of how we expect this to develop in the future:

rbauststfc commented 1 year ago

Information about resolution calculation: