pyxem / kikuchipy

Toolbox for analysis of electron backscatter diffraction (EBSD) patterns
https://kikuchipy.org
GNU General Public License v3.0
79 stars 30 forks source link

Hough indexing from PyEBSDIndex #458

Closed hakonanes closed 1 year ago

hakonanes commented 2 years ago

After a positive experience of contributing to @drowenhorst-nrl's PyEBSDIndex package, I think it is worth working towards interfacing with the Hough indexing capabilities in that package in kikuchipy. ~Before doing that though, that package must become available on PyPI (and hopefully Anaconda), so there is some work left.~

Note that so far only indexing of FCC and BCC materials is possible.

A rough sketch of functionality we could support:

I think the focus should be on providing top-level convenience methods in kikuchipy, while more fine-grained access to the functions used in indexing by PyEBSDIndex should be accessed from that package. If this work is realized, we must take care not to duplicate efforts, and should always consider in which package stuff should go.

~See also https://github.com/drowenhorst-nrl/PyEBSDIndex/issues/1.~ Not available anymore since the repository was moved to https://github.com/USNavalResearchLaboratory/PyEBSDIndex without keeping issues.

EDIT: I've added use of PyEBSDIndex in some of the tutorials in the docs:

hakonanes commented 2 years ago

Here's an example notebook where I've tried out both PC optimization and Hough indexing in PyEBSDIndex on simulated and experimental patterns.

My plan is to start a branch in this repo where we work towards supporting this functionality alongside improving PyEBSDIndex.

IMBalENce commented 2 years ago

This sounds like a really interesting idea! It can be very powerful if Kikuchipy can utilise a "hybrid" indexing method - using Hough indexing for a fast stage-one indexing, then re-index the pixels with low confidence using dictionary indexing. Could massively improve the index efficiency.

hakonanes commented 2 years ago

Yes, that would be very cool. For that to work, we need some "logistical tools", like supporting a navigation mask (the "bad" map points) in EBSD.dictionary_indexing(). Another thing we should take care with is that misorientations between points indexed with different indexing methods will be different due to the method's "angular resolution". This drawback could be solved by refining all points after Hough and dictionary indexing using matching to simulated patterns (like EBSD.refine_orientation()).

hakonanes commented 2 years ago

PyEBSDIndex is now available on PyPI as a release candidate v0.1rc1, so we can start to depend on it for Hough indexing and PC optimization of cubic materials and NLPAR!

hakonanes commented 2 years ago

In the short term, I think adding a notebook explaining how to use PyEBSDIndex with kikuchipy is best. We can then work towards making workflows simpler via convenience functions in kikuchipy and in PyEBSDIndex.

jwestraadt commented 2 years ago

PyEBSDIndex is now available on PyPI as a release candidate v0.1rc1, so we can start to depend on it for Hough indexing and PC optimization of cubic materials and NLPAR!

Firstly, thank you very much for all the hard work going into the kikuchipy package! I have a question regarding the application of NLPAR from the PyEBSDIndex package on datasets saved in data formats other than the EDAX 'up1' or 'up2' formats. Is it currently possible to run this on other formats without changing code on the PyEBSDIndex side? I tried following your short tutorial in the documentation using the sample dataset from EMsoft's DITutorial ('EDAX-Ni.h5') but it has some trouble reading it.

Alternatively, would you be able to make the "scan2v3.up1" data file available to run the NLPAR example on and to see how the data is saved in this format?

Thanks Johan

drowenhorst-nrl commented 2 years ago

@jwestraadt what kind files are you trying to process? I have been working out some bugs with NLPAR within the PyEBSD package, and while up1 and up2 (version 3) files appear to be well supported, other files are not as well tested. On my personal repo (https://github.com/drowenhorst-nrl/PyEBSDIndex), I have pushed some edits that allow for up1/2 version 1 files - but have yet to document how to use the changes. I have also tested a bit on HDF5 files. I want to make some further improvements for those, but not sure when I will get to it. Right now, I make a full copy of the HDF5 file, and put the NLPAR parameters on the end of the filename. I want to 1.) include the parameters within the hdf5 under a header directory. 2.) Make the option to save as a new datagroup/set within the HDF5 file. Unfortunately, the vendor's insistence to use different naming conventions for everything in their HDF5 files makes this a bit more difficult.

Re: releasing the scan2V3.up1 ... I have to look into that. That is part of a larger dataset that I need to get permissions to release.

drowenhorst-nrl commented 2 years ago

@jwestraadt I should also note, I have NOT written the overall NLPAR implementation to work with an array of patterns in memory. It is assumed that the original file is the input, and that the result will be a new file. With most EBSD scans with patterns taking up 3-30GB, and knowing that a lot of people don't have 100s of GB of RAM, I made the decision to process the pattern files in chunks, and assume that there is no guarantee that even the results could be held in RAM. If you are interfacing with kikuchipy, I realize your patterns may well be in RAM, and thus this may be leading to the confusion?

jwestraadt commented 2 years ago

@drowenhorst-nrl I managed to get the Hough indexing to run using PyEBSDIndex on an Oxford dataset by first reading it with kikuchipy then saving it in .h5 format. Then I followed the example explained in the notebook, where the vendor was set to 'KIKUCHIPY' in the indexer. I could extract the pattern data from the .h5 file.

Is it possible to follow a similar approach for NLPAR? It would be useful to have a class KIKUCHIPY(HDF5PatFile) that are able to read the patterns, create a copy and replace them with the averaged patterns from a dataset saved in the kikuchipy format. At the moment it does not recognize the kikuchipy .h5 format nor the EMsoft DITutorial .h5 example. I might be doing something wrong as well...

I guess the plan is to eventually build NLPAR this into kikuchipy from the opening thread.

drowenhorst-nrl commented 2 years ago

@jwestraadt assuming the "patterns.h5" in the kikuchipy data repo acts like a reasonable template for the kichuchipy hdf5 format, I should have something that might work for you on my personal repo. Should be as easy as: nlobj = nlpar.NLPAR('path/to/kikuchipy.h5') nlobj.opt_lambda() nlobj.calcnlpar(searchradius=5) Obviously, this runs using all the defaults, other than setting search radius. Biggest issue is if you need background subtraction, which I need to improve ... the current implementation is too simple for how NLPAR is processed (long story for another time).

hakonanes commented 2 years ago

Hi Johan (and Dave), thank you for the kind words!

I haven't used NLPAR in PyEBSDIndex myself, so Dave has to help out there (as he has done).

What I have used from PyEBSDIndex is pyebsdindex.pcopt.optimize() and pyebsdindex.ebsd_index.EBSDIndexer.index_pats(), which take in NumPy arrays or file paths. I've only passed small NumPy arrays which fit in memory. The kikuchipy HDF5 format was initially written to be read by EMsoft, where the only requirement is that patterns are stored in an array with shape (n patterns, n detector rows, n detector columns). It has not changed much since 2019, and I don't anticipate that it will. This format supports writing a (kikuchipy) EBSD signal to an existing file (see the end of the "h5ebsd" section in the docs).

What I'd like to test is to pass a larger-than-memory Dask array to index_pats() to see how it handles that... (in short, Dask mirrors NumPy but operates on array chunks once compute() is called). EBSD patterns are "read" as a Dask array from file when load(..., lazy=True) is used in kikuchipy.

I guess the plan is to eventually build NLPAR this into kikuchipy from the opening thread.

I would very much like that, but since PyEBSDIndex is in active development, it would require much refactoring in kikuchipy if we were to add PyEBSDIndex as a dependency. And the available developer time is very limited... I think a good approach is to try to make them work nicely together by making some convenience methods and classes where we see the need, such as the kikuchipy HDF5 file reader in PyEBSDIndex!

jwestraadt commented 2 years ago

Hi Dave and Håkon,

Thanks, it is working quite nicely now. I could do the following:

  1. Read the Oxford .ebsp file (6GB) into kikuchipy and then save it in .h5 format.

  2. Perform NLPAR on the dataset and save it. Initially it could not find/read the .h5 file, but after some troubleshooting I found that it was related to hdf5 file locking. I had to set file locking to false in order to read this larger file for some reason. import os os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE" The smaller dataset 'patterns.h5' in the kikuchipy repo worked without any issues.

  3. Performed the HI by passing the patterns into the PyEBSDIndex indexer. I used refined PC values from an earlier Dictionary Indexing run.

  4. The HI after NLPAR is qualitatively better for this plastically deformed polycrystalline diamond sample, especially around the grain boundaries where the indexing success rate of the HI performed during data acquisition was very low.

drowenhorst-nrl commented 2 years ago

That is interesting about the "use_file_locking", what size is your HDF5 file? Did you happen to also have it open in something like HDFView at the same time?
Really glad to hear that it is giving good results. I have a summer undergraduate student that is looking at how sensitive NLPAR is to small orientation gradients, which might be relevant for the deformed areas.

hakonanes commented 2 years ago

Glad to hear that the Oxford .ebsp file reader is useful and that you got kikuchipy and PyEBSDIndex to work nicely (?) together!

Did you happen to also have it open in something like HDFView at the same time?

If I open the patterns.h5 HDF5 file in HDFView and then try to open the file in Python with mode="r+" I get this error:

>>> from h5py import File
>>> file = File("patterns.h5", mode="r+")
Traceback (most recent call last):
  File "/home/hakon/miniconda3/envs/kp-dev/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-90063f0a65b1>", line 4, in <cell line: 4>
    file = File(
  File "/home/hakon/miniconda3/envs/kp-dev/lib/python3.9/site-packages/h5py/_hl/files.py", line 507, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/home/hakon/miniconda3/envs/kp-dev/lib/python3.9/site-packages/h5py/_hl/files.py", line 222, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
BlockingIOError: [Errno 11] Unable to open file (unable to lock file, errno = 11, error message = 'Ressursen midlertidig utilgjengelig')

I did not get this error when the file was not opened in HDFView, as expected. Was this what you experienced, @jwestraadt?

hakonanes commented 2 years ago

Now that PyEBSDIndex is more generally available, I've added use of PyEBSDIndex to some tutorials in the docs (see links in https://github.com/pyxem/kikuchipy/issues/458#issue-1037158134).

hakonanes commented 1 year ago

590 adds Hough indexing in kikuchipy via EBSD.hough_indexing(). It is a very thin wrapper around PyEBSDIndex. The new functionality will be released in version 0.8, which I hope we can release within the next few weeks (some outstanding additions remain).

New issues should be raised if we decide to wrap non-local averaging (NLPAR) and a Hough transform (not indexing) from PyEBSDIndex in the future.