pace-neutrons / Horace

Horace is a suite of programs for the visualization and analysis of large datasets from time-of-flight neutron inelastic scattering spectrometers.
https://pace-neutrons.github.io/Horace/stable/
GNU General Public License v3.0
8 stars 5 forks source link

Unreliable algorithm to identify unique runid-s #1572

Open abuts opened 7 months ago

abuts commented 7 months ago

Run-id-s are written to PixelData and identify unique experiment(run) contributed into this pixel. These data are used in resolution calculations.

It is conveniet to define run-ids as experiment numbers (ISIS run numbers) as if there are issues with particular experimental run, run_inspector will directly show the image of this run and the run number which have the issue.

Unfortunately run numbers are not currently written in nxspe data. To identify experimental data Horace currently uses complex procedure of extracting experiment numbers from file names. If this procedure fails, run_ids assumed undefined which should trigger the procedure of renumbering the unique experiments(runs) according to the numbers of the contributing files.

Additional procedure triggers renumbering of the runs if replicate option is provided to gen_sqw which adds complexity to the algorithm.

The procedure is complex, unreliable and contains parts of the algorithm scattered through gen_sqw/write_nsqw_to_sqw code.

The purpose of this ticket is to simplify the algorithm of run_id calculations, collect it in single place and write unit tests for it.

Ideal solution would be writing run_numbers externaly into nxspe file and if the number is not there, always trigger renumbering procedure. Single place for renumbering and unit tests for it are necessary in any case

tgperring commented 7 months ago

Some thoughts about runid that should clarify the design of a solution:

The runid should not be the same as the run number. The purpose of the runid is to identify a unique instance of the combination of data source and experimental information, to be assigned in gen_sqw. In the ideal scenario, all the information will be in the .nxspe file, but it can be that information come from one or more other sources: the data from the .nxspe, the orientation angle from rotation angle psi entered as one of the input arguments to gen_sqw (and in principle from the other goniometer angles gl, gs), instrument information and detector information explicitly given as input arguments to gen_sqw too. It is essential that runid is truly unique for this data/sample/sample orientation/detector array/instrument_setup combination.

It is important not to conflate the runid with one specific useful bit of information (namely the run number). So long as the runid is uniquely identifies the data/sample/sample orientation/detector array/instrument_setup combination, then information to be used in applications such as the Horace run_inspector can always be extracted. This includes, for example, getting the name of the .nxspe file for adding to error messages, or getting information for filtering/masking data by the value of psi or the run number (if the run number can be extracted from the nxspe file or from the file name) etc. However, we cannot rely on the ISIS run number being a unique identifier. The case of gen_sqw used with the replicate option is a good example: the run numbers will be the same for all data sources, but by construction the sample angle psi will be different, and hence the runids should all be different.

The name ‘runid’ is perhaps misleading. It is not a synonym for ‘run_number’ – it is really means ‘unique_data_and_experimental_configuration_identifier’ for the entirety of the data/sample/sample orientation/instrument/detector combination. Often this is achieved with the run number, but certainly cannot be guaranteed.

abuts commented 1 month ago

Part of it is an epic, part of it done through Re #1728