pyxem / kikuchipy

Toolbox for analysis of electron backscatter diffraction (EBSD) patterns
https://kikuchipy.org
GNU General Public License v3.0
78 stars 30 forks source link

Hough indexing of lazy EBSD patterns errors with PyEBSDIndex 0.3 #675

Closed hakonanes closed 1 month ago

hakonanes commented 1 month ago

Hough indexing of lazy EBSD patterns, where the patterns are in a Dask array, work fine with PyEBSDIndex 0.2. This was because every EBSD pattern was converted to a NumPy array by PyEBSDIndex (as a side-effect!) before being passed to Numba for further processing.

Numba requires NumPy arrays. In PyEBSDIndex 0.3, our Dask array is not converted before being passed to Numba, resulting in an error.

@drowenhorst-nrl: Do you have any suggestions for how we can solve this? It was very nice to be able to pass the entire Dask array to PyEBSDIndex so that you could operate on it chunk by chunk, without using having to write it into memory.

A minimal working example:

>>> import kikuchipy as kp
>>> s = kp.data.nickel_ebsd_large(lazy=True)
>>> pl = s.xmap.phases
>>> indexer = s.detector.get_indexer(pl)
>>> xmap = s.hough_indexing(pl, indexer)
{
    "name": "TypingError",
    "message": "Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/radon_fast.py (272)

File \"../../../../opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/radon_fast.py\", line 272:
  def radon_faster(self,imageIn,padding = np.array([0,0]), fixArtifacts = False, background = None, normalization=True):
      <source elided>

  @staticmethod
  ^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'>
",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/opencl/band_detect_cl.py:104, in BandDetect.find_bands(self, patternsIn, verbose, clparams, chunksize, useCPU, **kwargs)
    103 #rdnNorm, clparams, rdnNorm_gpu = self.calc_rdn(patterns[chnk[0]:chnk[1],:,:], clparams, use_gpu=self.CLOps[0])
--> 104 rdnNorm, clparams = self.radon_fasterCL(patterns[chnk[0]:chnk[1],:,:], padding=self.padding,
    105                                                                fixArtifacts=False, background=self.backgroundsub,
    106                                                                returnBuff=True, clparams=clparams)
    108 #rdnNorm, clparams = self.rdn_mask(rdnNorm, clparams=clparams, returnBuff=False)
    109 
    110 #if (self.EDAXIQ == True): # I think EDAX actually uses the convolved radon for IQ
   (...)
    114   #rdnNorm_nocov = np.zeros((nRp,nTp,nImCL),dtype=np.float32)
    115   #cl.enqueue_copy(clparams.queue,rdnNorm_nocov,rdnNorm,is_blocking=True)

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/opencl/band_detect_cl.py:270, in BandDetect.radon_fasterCL(self, image, padding, fixArtifacts, background, returnBuff, clparams)
    268 #radon_gpu = cl.Buffer(ctx,mf.READ_WRITE,size=radon.nbytes)
    269 #radon_gpu = cl.Buffer(ctx,mf.READ_WRITE | mf.COPY_HOST_PTR,hostbuf=radon)
--> 270 image_gpu = cl.Buffer(ctx,mf.READ_ONLY | mf.COPY_HOST_PTR,hostbuf=image)
    271 imstep = np.uint64(np.product(shapeIm[-2:]))

TypeError: a bytes-like object is required, not 'Array'

During handling of the above exception, another exception occurred:

TypingError                               Traceback (most recent call last)
File /Users/hakon/kode/scratches/kp_scratch.py:3
      1 #%%
      2 indexer = s.detector.get_indexer(pl)
----> 3 xmap = s.hough_indexing(pl, indexer)

File ~/kode/personal/kikuchipy/kikuchipy/signals/ebsd.py:1687, in EBSD.hough_indexing(self, phase_list, indexer, chunksize, verbose, return_index_data, return_band_data)
   1684 if self._lazy:  # pragma: no cover
   1685     patterns = patterns.rechunk({0: chunksize, 1: -1, 2: -1})
-> 1687 xmap, index_data, band_data = _hough_indexing(
   1688     patterns=patterns,
   1689     phase_list=phase_list,
   1690     nav_shape=nav_shape,
   1691     step_sizes=step_sizes,
   1692     indexer=indexer,
   1693     chunksize=chunksize,
   1694     verbose=verbose,
   1695 )
   1697 xmap.scan_unit = _get_navigation_axes_unit(am)
   1699 if return_index_data and return_band_data:

File ~/kode/personal/kikuchipy/kikuchipy/indexing/_hough_indexing.py:235, in _hough_indexing(patterns, phase_list, nav_shape, step_sizes, indexer, chunksize, verbose)
    232 print(info_message)
    234 tic = time()
--> 235 index_data, band_data, _, _ = indexer.index_pats(
    236     patsin=patterns, verbose=verbose, chunksize=chunksize
    237 )
    238 toc = time()
    239 patterns_per_second = n_patterns / (toc - tic)

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/_ebsd_index_single.py:524, in EBSDIndexer.index_pats(self, patsin, patstart, npats, xyloc, clparams, PC, verbose, chunksize)
    521 if npats == -1:
    522     npats = npoints
--> 524 banddata, bandnorm = self._detectbands(pats, PC, xyloc=xyloc, clparams=clparams, verbose=verbose,
    525                                        chunksize=chunksize)
    526 tic = timer()
    528 indxData, banddata = self._indexbandsphase(banddata, bandnorm, verbose=verbose)

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/_ebsd_index_single.py:618, in EBSDIndexer._detectbands(self, pats, PC, xyloc, clparams, verbose, chunksize)
    617 def _detectbands(self, pats, PC, xyloc=None, clparams=None, verbose=0, chunksize=528):
--> 618     banddata = self.bandDetectPlan.find_bands(
    619         pats, clparams=clparams, verbose=verbose, chunksize=chunksize
    620     )
    621     #  shpBandDat = banddata.shape
    622     if PC is None:

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/opencl/band_detect_cl.py:220, in BandDetect.find_bands(self, patternsIn, verbose, clparams, chunksize, useCPU, **kwargs)
    218 except Exception as e: # something went wrong - try the CPU
    219   print(e)
--> 220   bandData = band_detect.BandDetect.find_bands(self, patternsIn, verbose=verbose, chunksize=-1, **kwargs)
    222 return bandData

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/band_detect.py:411, in BandDetect.find_bands(self, patternsIn, verbose, chunksize, **kwargs)
    409 for chnk in chunk_start_end:
    410   tic1 = timer()
--> 411   rdnNorm = self.radonPlan.radon_faster(patterns[chnk[0]:chnk[1],:,:], self.padding, fixArtifacts=False, background=self.backgroundsub)
    412   rdntime += timer() - tic1
    413   tic1 = timer()

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/radon_fast.py:259, in Radon.radon_faster(self, imageIn, padding, fixArtifacts, background, normalization)
    256 radon = np.full([self.nRho + 2 * padding[0],self.nTheta + 2 * padding[1], nIm],self.missingval,dtype=np.float32)
    257 shp = radon.shape
--> 259 counter = self.rdn_loops(image,self.indexPlan,nIm,nPx,indxDim,radon,
    260                          np.asarray(padding), np.float32(normalization > 0))
    262 if (fixArtifacts == True):
    263   radon[:,padding[1],:] = radon[:,padding[1]+1,:]

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/numba/core/dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
    464         msg = (f\"{str(e).rstrip()} \
\
This error may have been caused \"
    465                f\"by the following argument(s):\
{args_str}\
\")
    466         e.patch_message(msg)
--> 468     error_rewrite(e, 'typing')
    469 except errors.UnsupportedError as e:
    470     # Something unsupported is present in the user code, add help info
    471     error_rewrite(e, 'unsupported_error')

File /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/numba/core/dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
    407     raise e
    408 else:
--> 409     raise e.with_traceback(None)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at /opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/radon_fast.py (272)

File \"../../../../opt/homebrew/Caskroom/miniforge/base/envs/kp/lib/python3.10/site-packages/pyebsdindex/radon_fast.py\", line 272:
  def radon_faster(self,imageIn,padding = np.array([0,0]), fixArtifacts = False, background = None, normalization=True):
      <source elided>

  @staticmethod
  ^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dask.array.core.Array'>
"
}
hakonanes commented 1 month ago

I believe they are related to these changes in band_detect_cl.py: https://github.com/USNavalResearchLaboratory/PyEBSDIndex/commit/88136e60b8bc6a6c53c507bfc4dda1a39163e864#diff-b1a7457abbba36e579c00423396bc962373e6fcb044b4c43a3c2ad0384cca8be

drowenhorst-nrl commented 1 month ago

What is the data type of your dask array?

hakonanes commented 1 month ago

In this example it is uint8.

drowenhorst-nrl commented 1 month ago

Trying to figure out exactly what is up ... I see Numba error, but seems like it is failing on making an opencl buffer?

drowenhorst-nrl commented 1 month ago

Ah - I see what is up ... I think. When it fails to run the OpenCL version, it attempts the numba version. Both are failing for similar reasons. Let me look at what I can do here.

hakonanes commented 1 month ago

Thank you for having a look, really appreciate it! Any potentially useful information you have about the problem is useful.

drowenhorst-nrl commented 1 month ago

If only all the fixes where this easy (at least I hope it is this easy). Maybe not the most elegant solution, but it looks like throwing an image = np.asarray(image) at the top of the radon functions should work. This will make a copy of the chunk of patterns read in, but I think I was doing that before anyway. While I have made many improvements on making PyEBSDIndex faster, I have not worried much about the memory efficiency.

Some insight:

Thus, I stoped converting to float on the CPU (which would have made a copy in this case), and tried sending a version as is to the GPU. OpenCL and Numba will work with numpy, but not Dask arrays ... thus the issue on both pipelines.

I will send in a PR for you to test.

drowenhorst-nrl commented 1 month ago

BTW - old version on my machine: 780 pats/s, new 1180 pats/s

hakonanes commented 1 month ago

Your changes in https://github.com/USNavalResearchLaboratory/PyEBSDIndex/pull/60 fixes my issue here, thanks!

drowenhorst-nrl commented 1 month ago

PyEBSDIndex 0.3.2 is now out on PyPi and conda-forge, and should be the fix for this issue.

hakonanes commented 1 month ago

Thank you so much for the quick handling of my issue here. Have a good weekend!