hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
46 stars 28 forks source link

Not-implemented features of Bruker's format(s) #17

Open sem-geologist opened 6 years ago

sem-geologist commented 6 years ago

This Issue is for discussion, polling priorities and tracking the progress of additional implementation of Bruker formats, and eventual re-organization of Bruker fileformat plugins. (P.S. It is going to be edited and updated.)

Non bcf, but with reusable part of codebase already implemented in bcf plugin or usable for bcf:

BCF features which are not implemented (missing):

  1. pixel times array (some bcf have array full of 0, but some OEM implementation have some sensible data, which marks dwell time(?) per pixel) update: actually the pixel times theoretically can be retrieved from zero energy peak. (#1355)
  2. More images saved as overview (Currently only the first one is returned)
    • [x] 6. Stage data (now it is returned in original_metadata, but new metadata specification of hyperspy have the stage metadata definition. *? Esprit recognize only tilt, hyperspy have 2 tilt fields, probably it should map to tilt_alpha?

major bcf extension which requires new signal_type's to be defined:

ericpre commented 6 years ago

For record, from @sem-geologist in https://github.com/hyperspy/hyperspy/pull/1839#issuecomment-366528691:

The file does not contains zeroes (I mean it is not saved as zeros, it is saved as empty pixels, the empty line takes exactly 32bits). It looks a bit weird. But this is OEM specific way of gathering mapping for small selected rectangle. There is two known for me ways:

  1. Esprit takes separate overview image, and during mapping it registers microscope images (of same kind as in overview) and saves it. In version 2 bcf it puts rectangle marker on overview image marking where smaller mapping (and images) were acquired. (This type of behavior I am well aware as our Zeiss-Bruker microscope systems behave like that)
  2. Esprit takes overview image, but during mapping it registers no microscope image(s) for selected region. It does not attach the rectangle marker to overview image. Instead it saves whole size of array as the original overview image, with empty pixels outside of rectangle. (As I dont have system with such behavior, It is not obvious for me how it should map into hyperspy, as I dont know how it is represent in Esprit)

[...]

Maybe for 2nd way it could be possible to emulate the 1st behaviour. I think I will need to review this format and decide what to implement.

Emulating the first behaviour would be nice and it should reduce memory storage (maybe not at loading time, though).

ericpre commented 6 years ago

+1 on reading single spectrum .spx files! +1 on better metadata.

sem-geologist commented 6 years ago

I guess these features should go to RELEASE_next_minor. I think I can start to work on implementing .spx as soon the lastest bcf changes in RELEASE_next_patch will be merged into RELEASE_next_minor.

ericpre commented 6 years ago

Release_next_minor is ready to go!

sem-geologist commented 6 years ago

For record, from @sem-geologist in hyperspy/hyperspy#1839 (comment):

The file does not contains zeroes (I mean it is not saved as zeros, it is saved as empty pixels, the empty line takes exactly 32bits). It looks a bit weird. But this is OEM specific way of gathering mapping for small selected rectangle. There is two known for me ways:

  1. Esprit takes separate overview image, and during mapping it registers microscope images (of same kind as in overview) and saves it. In version 2 bcf it puts rectangle marker on overview image marking where smaller mapping (and images) were acquired. (This type of behaviour I am well aware as our Zeiss-Bruker microscope systems behave like that)
  2. Esprit takes overview image, but during mapping it registers no microscope image(s) for selected region. It does not attach the rectangle marker to overview image. Instead it saves whole size of array as the original overview image, with empty pixels outside of rectangle. (As I dont have system with such behavior, It is not obvious for me how it should map into hyperspy, as I dont know how it is represent in Esprit)

    [...]

Maybe for 2nd way it could be possible to emulate the 1st behaviour. I think I will need to review this format and decide what to implement.

Emulating the first behaviour would be nice and it should reduce memory storage (maybe not at loading time, though).

Lately I got some tasks with EBSD, and got to know that Esprit part better. I had explored that on our Esprit 1.9.x the 1st way described above is used when Esprit mapping (hypermapping, under "objects" tab) is used, and the second option – when mapping is done during EBSD (EBSD tab). Would emulating first behaviour save the memory? Actually not, as bcf reader first creates np.zeros(shape), which differently than np.ones() does not use the memory, until bcf reader fills the pixels which are not empty. Only those pixels are getting into memory. That is really nice behaviour of numpy. In the end I experienced how multi-ROI tool of Esprit works: it is working like a mask. So the 2nd type of bcfs should actually use numpy.ma. The most important question then is: can numpy.ma work with dask for lazy loading?

imikejackson commented 4 years ago

I have been able to parse the EBSD files that are stored in the .bcf file. There is a mix of plain text, XML and raw binary. If anyone is interested in the details please contact me. Would be great to see HyperSpy be able to pull out the EBSD data.

hakonanes commented 4 years ago

Hi @imikejackson (and others)!

I am very interested in the details.

So we've recently (@tinabe and I) created a package extending HyperSpy's Signal2D class for EBSD data with some common pattern processing methods + EDAX/Bruker h5ebsd readers and NORDIF binary reader/writer:

I am also at the moment attempting to read patterns and metadata stored in a BCF file into our EBSD class, based upon the great work behind the BCF reader in HyperSpy. I am therefore very interested in the details. Will contact you by mail.

francisco-dlp commented 4 years ago

Sounds great. Could you keep @sem-geologist (the author of the BCF reader) in the loop, unless he wishes otherwise?

imikejackson commented 4 years ago

I've sent Hakon all the information that I have on how to parse the 2D EBSD data files found within the .bcf file.

sem-geologist commented 4 years ago

@imikejackson and @hakonanes , do you mean you are able to pull the files out from bcf? or are you one step further: have you Reverse engineered the binary files inside the container (aside to numerous xml) which hold array of 2D images (per pixel) of kikuchi bands? for the first that is simple, as bcf is basically sfs (SingleFileSystem developed by AidAim), and the reader of sfs is Reverse engineered and is inside bcf plugin (and I believe You had used it to extract the files).

I really would like to see the bcf plugin being extended to do ebsd. I had looked preliminary year ago and saw that the binary part is easily RE. I however currently am out of free time to implement that. I think It would be good idea to open the new issue (for EBSD as 2D signal reading implementation in bcf), and I could give some support. Particularly to get that into hyperspy the structure of bruker_reader will need to be a bit reorganized. As file (plugin file) is getting a bit oversize for my liking I am tempted to split it into a few parts (It would be logical to move the SFSReader parts into separate file and move it somewhere into some of misc directory).

hakonanes commented 4 years ago

I believe I will be able to read necessary metadata and patterns from a BCF file, however, haven't fully started on it yet (hopefully in the next couple of weeks). When I do, I will base it on your BCF plugin tools.

As noted above (https://github.com/hyperspy/rosettasciio/issues/17), @tinabe and I are extending HyperSpy for EBSD in this package: https://github.com/kikuchipy/kikuchipy. I therefore plan to open a WIP PR there when I start with the plugin, separate from the plugin in HyperSpy (https://github.com/hyperspy/hyperspy/blob/RELEASE_next_minor/hyperspy/io_plugins/bruker.py). I realise this is not what you have in mind, so do you have any thoughts on this?

sem-geologist commented 4 years ago

Hmm, Initially I thought that kikuchipy imports data with hyperspy io_plugins, And I had idea that it would rather be better to add that reader at hyperspy side, which would latter migrate to planned RossettaIO (or "whatever" that name for Hyperspy split). The bcf of EBSD part done by Bruker can have simultaneously acquired EDX hypermap. This will need somehow to be addressed. However I see kikuchipy have its own readers. Thus it is probably going to be easier to do that at kikuchipy side. I see heavy usage of DictionaryTreeBrowser all over the place in kikuchipy readers. Ouch. That makes those io useless without hyperspy.

hakonanes commented 4 years ago

Aha, I've read about RosettaScIO and known that we would have to restructure in the future (when hopefully some deprecation warnings start to pop up?). However, I haven't looked into the PR (https://github.com/hyperspy/hyperspy/pull/2174) in detail or the discussion linked therein...

Yes, obtaining EBSD/EDS from the same scan point is powerful. Should in some way be possible to use both of these signals in analysis, e.g. stacking both in the samples/variables matrix for decomposition.

I see that the plan is to not depend on the DictionaryTreeBrowser class in RosettaScIO, thanks for pointing this out, @sem-geologist. Will work to remove this in our plugins.

So, I guess, until RosettaScIO is released, I suggest that I try to implement the BCF EBSD reader in KikuchiPy.

imikejackson commented 4 years ago

@sem-geologist Sorry for the late reply, but yes, I am able to fully parse the binary files within the .bcf files to extract both patterns and other scalar/vector data. This is the data that would normally be printed in Bruker's exported .ctf file.

sem-geologist commented 4 years ago

@imikejackson , I was asking about saved binary images of kikuchi patterns, not vectorised stuff which is basically xml.