pyxem / orix

Analysing crystal orientations and symmetry in Python
https://orix.readthedocs.io
GNU General Public License v3.0
80 stars 49 forks source link

Read orientations and quality metrics from h5ebsd format files #47

Closed hakonanes closed 4 years ago

hakonanes commented 4 years ago

Thought it would be best to discuss this here than in the issue #46.

EMsoft's dot product files contain all their Euler angles and quality metrics in an HDF5 file in the h5ebsd format introduced by Jackson et al., 2015. Both EDAX TSL, Bruker and Oxford can write angles/metrics to these formats as well. Briefly, the format requires a structure in which the top HDF group contains at least two string data sets, Manufacturer and Version, and one group, e.g. Scan 1 (EMsoft/TSL) or Scan 0 (Bruker). Within these scan groups, the formats differ somewhat between vendors, but we could first determine vendor from Manufacturer, and then read the scans appropriately.

Examples of these files:

There is also the question of how to handle quality metric maps, or any maps in general. So far the load_ang and load_ctf returns Rotation objects. In the short term, I guess we can also return a dictionary with all the quality maps entered by name?

Should we create a load_h5ebsd function (located in io/__init__.py)? Or should we create separate load_emsoft, load_h5ebsd_edax, load_h5ebsd_bruker etc.?

Since I have a setup for reading patterns from the EDAX and Bruker's h5ebsd files in KikuchiPy (https://github.com/kikuchipy/kikuchipy/blob/master/kikuchipy/io/plugins/h5ebsd.py), I can take responsibility for at least starting on this (or these) functions. Will try to not add any additional dependencies apart from h5py.

As soon as we make a decision on one/multiple readers, I will make a work-in-progress PR here.

dnjohnstone commented 4 years ago

Thanks for this.

If they have the same file extension i.e. h5ebsd, which it sounds like it does to me, then I think it would be nicer if we have just one reader so that people don't have to think about which vendor they used. If this makes it much harder to implement though, separate is ok.

In terms of where things should go - I think we need to take this opportunity to first implement something akin to the CrystallographicMap class in pyxem....

I say this because we need:

  1. A good way to deal with multiple phases.
  2. The structure to have a dictionary of metrics stored with the rotations.

So I think we really need to make the CrystallographicMap class first?

In terms of how we store the metrics - I quite like the approach we took in pyxem with the dictionary of metrics and then a "get_metric_map" method.

One of our primary motivators was to be able to call the metrics what they are, particularly when they are just fundamentally different depending on how the orientation map was obtained i.e. how inidexation was done.

hakonanes commented 4 years ago

I've only seen h5ebsd files with extension h5 in the wild. To check if it is a valid h5ebsd file we can check for the manufacturer/version data sets in the top group. I agree that it is best to have one h5ebsd reader, which determines the "manufacturer" and thus the proper way to read the file. If it doesn't take too long to implement a simple CrystallographicMap class in orix, I can wait with creating the load_h5ebsd function. I've created a new issue to discuss its implementation #48.

dnjohnstone commented 4 years ago

ok - it's a shame they didn't include a useful file extension in their definitions. Do you think there's any chance of getting this lot to agree on one h5ebsd standard? Or even fit it in with nxs format conventions...

Anyway - short/medium term --> yes 1 reader for all this

hakonanes commented 4 years ago

Hehe, no.

OK, that sounds good, get this done when the CrystalMap class (#48) is underway (or done).

pc494 commented 4 years ago

Sorry, back clearing out stuff. This one can be closed because #48 is in and #59 is a more detailed description of what's going on?

hakonanes commented 4 years ago

Yup!