mctools / mcpl

Monte Carlo Particle Lists
https://mctools.github.io/mcpl/
Other
29 stars 13 forks source link

`mcpltool --merge` failure output can be confusing #77

Open g5t opened 9 months ago

g5t commented 9 months ago

An improvement in https://github.com/McStasMcXtrace/McCode/issues/1505 has made me aware that mcpltool --merge will, sensibly, refuse to combine files which have different 'header info'.

For two such files, part_0.mcpl.gz and part_1.mcpl.gz, a user sees

$ mcpltool --merge combined.mcpl.gz part_0.mcpl.gz part_1.mcpl.gz 
ERROR: Requested files are incompatible for merge as they have different header info.

Run with -h or --help for usage information

Following the provided advice gives the following 52 lines of help information,

$ mcpltool -h
Tool for inspecting or modifying Monte Carlo Particle List (.mcpl) files.

The default behaviour is to display the contents of the FILE in human readable
format (see Dump Options below for how to modify what is displayed).

This installation supports direct reading of gzipped files (.mcpl.gz).

Usage:
  mcpltool [dump-options] FILE
  mcpltool --merge [merge-options] FILE1 FILE2
  mcpltool --extract [extract-options] FILE1 FILE2
  mcpltool --repair FILE
  mcpltool --version
  mcpltool --help

Dump options:
  By default include the info in the FILE header plus the first ten contained
  particles. Modify with the following options:
  -j, --justhead  : Dump just header info and no particle info.
  -n, --nohead    : Dump just particle info and no header info.
  -lN             : Dump up to N particles from the file (default 10). You
                    can specify -l0 to disable this limit.
  -sN             : Skip past the first N particles in the file (default 0).
  -bKEY           : Dump binary blob stored under KEY to standard output.

Merge options:
  -m, --merge FILEOUT FILE1 FILE2 ... FILEN
                    Creates new FILEOUT with combined particle contents from
                    specified list of N existing and compatible files.
  -m, --merge --inplace FILE1 FILE2 ... FILEN
                    Appends the particle contents in FILE2 ... FILEN into
                    FILE1. Note that this action modifies FILE1!
  --forcemerge [--keepuserflags] FILEOUT FILE1 FILE2 ... FILEN
               Like --merge but works with incompatible files as well, at the
               heavy price of discarding most metadata like comments and blobs.
               Userflags will be discarded unless --keepuserflags is specified.

Extract options:
  -e, --extract FILE1 FILE2
                    Extracts particles from FILE1 into a new FILE2.
  -lN, -sN        : Select range of particles in FILE1 (as above).
  -pPDGCODE       : select particles of type given by PDGCODE.

Other options:
  -r, --repair FILE
                    Attempt to repair FILE which was not properly closed, by up-
                    dating the file header with the correct number of particles.
  -t, --text MCPLFILE OUTFILE
                    Read particle contents of MCPLFILE and write into OUTFILE
                    using a simple ASCII-based format.
  -v, --version   : Display version of MCPL installation.
  -h, --help      : Display this usage information (ignores all other options).

After a careful perusal, an user who is not comfortable using --forcemerge without understanding what differs between the headers of their files may decide to inspect each file using --justhead. The first file header information looks like

$ mcpltool --justhead part_0.mcpl.gz 
Opened MCPL file part_0.mcpl.gz:

  Basic info
    Format             : MCPL-3
    No. of particles   : 50079
    Header storage     : 314 bytes
    Data storage       : 1602528 bytes

  Custom meta data
    Source             : "MCSTAS 4 {McStas instrument name}"
    Number of comments : 1
          -> comment 0 : "Output by COMPONENT: {MCPL_output component name}"
    Number of blobs    : 1
          -> 179 bytes of data with key "mccode_cmd_line"

  Particle data format
    User flags         : no
    Polarisation info  : no
    Fixed part. type   : yes (pdgcode 2112)
    Fixed part. weight : no
    FP precision       : single
    Endianness         : little
    Storage            : 32 bytes/particle

And the second (elided) is nearly identical.

The user then must realize that the difference likely resides in the "mccode_cmd_line" blob, and then refer back to mcpltool -h output to discover how to access the contained data.

Finally, a user sees something like

$ mcpltool -bmccode_cmd_line part_0.mcpl.gz 
/{path}/{Instr_name}.out par1=0.05 par2=0.1 mcpl_filename=/{absolute_path}/part_0
$ mcpltool -bmccode_cmd_line part_1.mcpl.gz 
/{path}/{Instr_name}.out par1=0.05 par2=0.1 mcpl_filename=/{absolute_path}/part_1

Possible improvements

More useful error output from mcpltool --merge

The error message produced by mcpltool --merge could be more explicit. If it instead looked like

$ mcpltool --merge combined.mcpl.gz part_0.mcpl.gz part_1.mcpl.gz 
ERROR: Requested files (part_0.mcpl.gz, part_1.mcpl.gz) are incompatible for merge as they have different header blobs "mccode_cmd_line".

Run mcpltool -b"mccode_cmd_line" part_0.mcpl.gz to inspect stored binary blob in part_0.mcpl.gz
Run mcpltool -h or mcpltool --help for full usage information

That is, it lists

It would go a long way to helping a user understand what is different between their files.

More easily parsed help information

52 lines of help information is too much to display at once. It could be very nice to have separate help information for each sub-command, e.g.,

$ mcpltool --help
Tool for inspecting or modifying Monte Carlo Particle List (.mcpl) files.

The default behaviour is to display the contents of the FILE in human readable
format (see Dump Options below for how to modify what is displayed).

This installation supports direct reading of gzipped files (.mcpl.gz).

Usage:
  mcpltool [dump-options] FILE
  mcpltool --merge [merge-options] FILE1 FILE2
  mcpltool --extract [extract-options] FILE1 FILE2
  mcpltool --repair FILE
  mcpltool --version
  mcpltool --help TOOL
$ mcpltool --help merge
Create a new file, FILEOUT, with combined particles from specified
list of N existing and compatible files.
Non-compatible cause an error, see {???} for definition of compatibility.

Usage:
  mcpltool --merge [options] FILEOUT FILE1 FILE2 ... FILEN

Options:
  --inplace FILE1 FILE2 ... FILEN
                    Appends the particle contents in FILE2 ... FILEN into
                    FILE1. Note that this action modifies FILE1!
  --forcemerge [--keepuserflags] FILEOUT FILE1 FILE2 ... FILEN
               Like --merge but works with incompatible files as well, at the
               heavy price of discarding most metadata like comments and blobs.
               Userflags will be discarded unless --keepuserflags is specified.
tkittel commented 9 months ago

Thanks for the report! Yes, indeed, this could be much more clearly handled.

A few stray thoughts:

One issue is of course, that any fancy code on my side which would make mcpltool produce more informative information, would have to be implemented in C - with all the hassle that entails.

At the very least, it is true that it might be helpful to indicate which key in the header-data is causing the incompatiblity. And perhaps one might also include some sort of blob-checksum value in the mcpltool printouts.

But nonetheless, I will keep in mind this issue when next getting around to an overhaul of MCPL. It might also be that simply linking to a dedicated page on a website would go some way.

For the revamp of --help: Hmm... maybe. But if I add too many "submenus", then people would complain that info was hidden away in some submenu.

ebknudsen commented 9 months ago

I might add a followup comment to @g5t very nicely long suggestion: Assuming a user now has an idea of which field is blocking the merge, or knew this beforehand - would an --ignore-header-field option make sense? I.e selectively ignore differences in some header field when merging. I have no idea how difficult it would be to program though, so it might be horrible idea.

tkittel commented 9 months ago

Sounds like a useful idea @ebknudsen !

Like all things MCPL, it might take some time before I actually do anything. However, I do plan to get back to MCPL development in the near future :-)