Add CLI for subhkl tools

marshallmcdonnell commented 4 months ago

Plan is to make 3 CLI sub-groups:

Find Peaks CLI
- input: tiff filename
- output: file with x, y in pixel space
Prepare Peaks CLI
- inputs:
  - file with x, y in pixel space
  - detector info (shape of detector, distance, height, orientation w/ default = 0)
- output: HDF5 file with "peaks file" [only peaks as scattering angles (theta and phi)]
Index Peaks CLI
- input: HDF5 file with "peaks file" [includes additional sample information like a, b, c, alpha, beta, gamma, centering; instrument information like wavelength bandwidth, goniometer matrix R, along with peaks from prepare peaks above)
- output: number of peaks, hkl, and resolved wavelength

Whiteboard with idea of breakdown:

SmithRWORNL commented 3 weeks ago

@marshallmcdonnell Do we have example input for indexer? I'm trying with a goniometer file of "1,2,3", wavelength_min = 1.5, wavelength_max=3.5, and everything else set to 0, but this is producing a singular matrix, which stops the calculation. The .tiff file I used for the earlier steps does have one peak on it, so I don't think that's the issue.

@zjmorgan I'll tag you as well. Do you have an example of a. b. c. alpha, beta, gamma, wavelength min, wavelength max, sample_centering values and a goniometer file in csv format which should work for peak indexing?

marshallmcdonnell commented 1 week ago

So we don't really have that goniometer file yet, we need to pull it out of some of the test data HDF5 files (like sucrose.h5)

Sorry for the delay but here is a plan we can do going forward that I think will get us to a testable state for the CLI:

[ ] First, let's run the indexer_using_file CLI tool to run it on the sucrose_mandi.h5 file found in the test data; the sucrose_mandi.h5 is the input for the filename in that CLI tool
[ ] Then, we need to prepare the inputs for the indexer CLI tool. Mainly need to inspect the sucrose_mandi.h5 file above and pull out the information (using something like hdfview) to get the test data and also put it in the correct file format for all of the ones below:
- [ ] peaks_csv_filename will be a CSV file with the contents of peaks/scattering and peaks/azimuthal for sucrose HDF5 file link
- [ ] goniometer_filename: will be CSV file with contents of goniometer/R from sucrose HDF5 file link
  - [ ] a, b, and c: sample a,b,c that come from sample/a, sample/b, sample/c in sucrose HDF5 file link
  - [ ] alpha, beta, and gamma: sample alpha, beta, gamma that come from sample/alpha, sample/beta, sample/gamma in sucrose HDF5 file link
  - [ ] wavelength_min will be 2 link
  - [ ] wavelength_max will be 4 link
  - [ ] sample_centering: sample centering input that comes from sample/centering in sucrose HDF5 file link
  - [ ] After we have all the input above, we can run the indexer CLI tool and compare to the indexer_using_file CLI output from first step; they should be identical
- [ ] Upload the test files from step 2 so we can write a test for this CLI tool going forward in another MR (or this one if you want)

Then we should be good!

SmithRWORNL commented 1 week ago

@marshallmcdonnell Question about sample/centering. What are it's valid values to list as options in the tool? It's commented as a ReflectionConditions in your code but ReflectionConditions is never defined. Based on the example files, values include P and F.

marshallmcdonnell commented 6 days ago

@SmithRWORNL

I probably forgot to add that enum-type class, thanks for catching that!

Based on searching the code base (https://github.com/search?q=repo%3Azjmorgan%2Fsubhkl%20centering&type=code), I see the potential values being:

Options: A, B, C, I, F, R_obv, R

I did not see P but that is one as well!

These are the "lattice types" (so maybe we change from ReflectionConditions -> LatticeTypes for the class name I have?): https://en.wikipedia.org/wiki/Hermann%E2%80%93Mauguin_notation#Lattice_types

Detail: I figured this out since I looked at how the generate_test_data.py was getting the centering value:

Using Mantid, there is a CrystalStructure object created here
Then there is a call to getHMSymbol() (which is for Hermann–Mauguin Symbol based on that wikipedia page above) and the first character of the symbol is being grabbed to create centering value here; I'm pretty sure that is going to be the lattice types from the wiki page

So... I think we should make an enum class maybe like the following (verbose method):

from enum import Enum
class LatticeTypes(Enum):
  PRIMITIVE = "P"
  BODY_CENTERED = "I"
  FACE_CENTERED = "F"
  BASE_CENTERED_A = "A"
  BASE_CENTERED_B = "B"
  BASE_CENTERED_C = "C"
  RHOMBOHEDRAL = "R"
  RHOMBOHEDRAL_OBSERVED = "R_obv"

Or this (less verbose but easier to type):

from enum import Enum
class LatticeTypes(Enum):
  P = "P"
  I = "I"
  F = "F"
  A = "A"
  B = "B"
  C = "C"
  R = "R"
  R_OBV = "R_obv"

And then we can use it like:

...
elif self.centering == LatticTypes.RHOMBOHEDRAL_OBSERVED:
    <do stuff>
...

or with the 2nd class:

...
elif self.centering == LatticTypes.R_OBV:
    <do stuff>
...

Why not just use Mantid and its classes?!? I'm really trying to keep Mantid out as a main dependency since it is such a "heavy" package to add as a dependency. We still need it for the generate test data script but the main package hopefully can avoid using it. That is why I opt for us to make our own simple enum class for this.

NOTE: I think there is a bug here in the if-else clauses since there are two R_obv cases; maybe one is just "R" and the other "R_obv"? We probably need @zjmorgan to help us here.

Thoughts?

SmithRWORNL commented 6 days ago

@marshallmcdonnell Thanks. I'll go ahead with the enum and tool for both R and R_obv so that they're ready once that bug is corrected.

marshallmcdonnell commented 6 days ago

And you might already be past this, @SmithRWORNL, but this was the code I was using via h5py package to get the values from the sucrose test file to extract:

>>> with h5py.File("tests/sucrose_mandi.h5") as f: print(f["sample/U"][:][:])
... 
[[ 0.11038608 -0.80648698  0.58085597]
 [-0.86923427  0.20504091  0.44987777]
 [-0.48191981 -0.55456016 -0.67839246]]

>>> with h5py.File("tests/sucrose_mandi.h5") as f: print(f["goniometer/R"][:][:])
... 
[[ 0.09545993 -0.19184567  0.97677154]
 [ 0.80682613 -0.55980837 -0.18880195]
 [ 0.58302572  0.80610782  0.10134683]]

zjmorgan / subhkl

Add CLI for subhkl tools #3