hyperspy / rosettasciio

Python library for reading and writing scientific data format
https://hyperspy.org/rosettasciio
GNU General Public License v3.0
47 stars 28 forks source link

File Reader for Oxford Instruments #97

Open k8macarthur opened 4 years ago

k8macarthur commented 4 years ago

I have been discussing with the guys at OI for sometime about if they want to help us read from their files. Unfortunately I don't seem to be making much progress. Is there more interest than just me in the community? Has anyone attempted this yet? I think it would be a useful addition to Hyperspy.

ericpre commented 4 years ago

I have been discussing with the guys at OI for sometime about if they want to help us read from their files.

Which file format exactly are you talking about? AZtec "oip"?

Unfortunately I don't seem to be making much progress.

Do you mean that you have no information about the binary format? If so, it would be a bit difficult to do anything, except if you managed to get some of @sem-geologist brain to reverse engineered the binary format... it may need to be provided with an AZtec license too! ;)

On a smilar topic, from discussion with @ppinard last year, recent version to Oxford Instrument software should be able to export to a h5EBSD format with some difference specific to Oxford Instrument. It is design to support EDS too but I haven't seen such a file yet. There is an example of such a reader at https://github.com/kikuchipy/kikuchipy/blob/master/kikuchipy/io/plugins/h5ebsd.py. Exporting EDS data using this h5EBSD format would be significantly better than with the current raw/rpl situation.

sem-geologist commented 4 years ago

@k8macarthur , @ericpre , My brain is not reverse-engineerable :P , you should not clone, disassemble or tinker with that.

Jokes aside, no AZtec license would be needed. Reverse engineering is exactly to achieve the opposite - that is to stop worrying about licenses. While I have little time lately (I am a father of a newborn), I can't promise to RE it completely, but could mentor in the process. From what I had seen from screenshots of AZtec I can guess that software has some quality, and thus binary files probably are also tidy. I am still in active RE of Cameca files, those are worse I had anything saw - full of template structures, junk, junk and junk (of course of dynamic length - just to make my life miserable). On the other hand in my experience, Jeol files are clean and tidy. Bruker which I successfully had RE are quite tidy, but complex, but most important logic, which made able to bite through few layers of complexity.

Now where to begin with:

  1. We need few binary files with different dimensions (best is to start with smallest dimensions like 16x16 pixels or smallest possible).
  2. Some readable note of metadata (hand written in notepad, word, or screenshot)
  3. Ideally would be to get raw/rpl of the same binary files...
  4. ...but if it is not possible, then some spectras from single pixels either in emsa or plain text, xls or other option.

Then if there are some leaks of information what language/s was/were used in programming AZtec, that will speedup RE as It will be more clear what kind of dynamic structures to expect in the file.

jat255 commented 4 years ago

While I don't have tons to add, I would like to voice my support (and skepticism that it's possible) for being able to read .oip files.

This is second-hand hearsay (and I don't remember who I was talking to about it), but at a recent conference I got into discussing the Oxford format with someone who was into data extracting, and they mentioned that project files are actually encrypted such that they can only be opened by Aztec (again, not sure if this is actually true, but might save some effort if we can get some confirmation from OI). The person I was talking to was trying to recover data from a multi-day acquisition that crashed near the end, and apparently Oxford apps told them it was not possible since the last thing that happens during the run is writing of the encryption key (and without that key the data was worthless).

hakonanes commented 4 years ago

@ppinard put out this nice specification of Oxford's H5EBSD format: https://github.com/oinanoanalysis/h5oina. We plan to implement a reader of the EBSD part in KikuchiPy's h5ebsd reader as referenced by @ericpre above. (With time, this should then be moved to RosettaScIO.) Note that the h5ebsd format is introduced in Jackson et al. (https://link.springer.com/article/10.1186/2193-9772-3-4).

I haven't tested it and don't have access to files in the format, but EMsoft has a reader for Oxford's binary format (.ebsp?) for EBSD patterns: https://github.com/EMsoft-org/EMsoft/blob/0754da2eec10225166b83795a9c32b441d8440eb/Source/EMsoftHDFLib/patternmod.f90#L540

sem-geologist commented 4 years ago

@jat255 , that is indeed very interesting and sounds challenging. Can one AZtec file be transported to other machine and opened on other AZtec program? I guess it can and so the encryption key should be in the same file. BTW technology of bcf (aid aim software single file system) have similar capability of being encrypted (and same design of key being kept at the file). Fortunately Bruker had not used that anti-feature. I however had explored it a bit. This starts to sound like something challenging my rusty brains.

ppinard commented 4 years ago

Disclaimer: I work for Oxford Instruments

The .oip file and its associated data files (in the data folder) change a lot from one version of AZtec to the other. We add new features, changed the compression, etc. This is why we provide export formats. Regardless of our internal changes, these always stay the same.

I completely agree with @ericpre that raw/rpl format is not ideal, but a few years ago this was the format decided/used by the community. Same applies to the EMSA format. HDF5 based format are now becoming the norm, so we introduced the H5OINA format as @hakonanes mentioned. It is under development and the plan is to export more and more types of data in the next releases. I actually have to update the online specs tomorrow as we added the export of electron images and other EBSD maps. Adding export of the whole EDS hypercube and EBSPs is in the pipeline. @k8macarthur I assume the EDS hypercube is what you are looking for?

Comments/clarifications:

jat255 commented 4 years ago

Thank you for your input, @ppinard, and your clarification about the .oip format. I'm optimistic that the H5OINA format may provide the sort of data access people are clamoring for.

Has there been any discussion internally (that you would be able to share) about publishing any sort of specification for the .oip format, at the very least to access collection metadata and the raw data contained within? This is something that HyperSpy has received from certain vendors in the past, and the community has not minded implementing the readers and keeping them up-to-date (see supported formats documentation for details). I know it makes people very pleased to know they have full access to the data that they collected to process as they desire.

As I am building automated data and metadata harvesting tools in my work at NIST (in support of the FAIR data principles, I would have a strong interest in accessing the data as it rests rather than in an exported format, since we cannot rely on users remembering to export into a certain format. I should add that open access to collected data is becoming an important consideration in procurement analyses for a number of researchers I've spoken with as well.

k8macarthur commented 4 years ago

@sem-geologist Congrats! Having officially finished the last of parental leave today I promise you do get more sleep...eventually....

I appreciate the enthusiasm guys! Like I said it is something notably missing from Hyperspy. I currently have a very large OxfordInstruments box sat and the bottom of our staircase waiting to be installed in the next couple of weeks. Hence my keenness to get things working for as soon as I have data. I've really been make a lot use of PCA and NMF recently.

@ppinard we would want the EDX data cube, but also some key metadata, including pixel dimensions, and energy scale and offsets. Also knowing the live time (not real time or set dwell time) in case these are different is particularly useful for my work. I the case of other readers we are also able to extract things like take off angle (used for absorption correction), accelerating voltage, microscope acquired on, and an elements list which was selected during the experiment. Largely as @jat255 suggests as much experimental metadata as possible to allow for FAIR information.

ppinard commented 4 years ago

@jat255

Has there been any discussion internally (that you would be able to share) about publishing any sort of specification for the .oip format, at the very least to access collection metadata and the raw data contained within?

Yes. At the time, it was decided that this would create additional amount of work to document the .oip format and support third-parties trying to read it.

I know it makes people very pleased to know they have full access to the data that they collected to process as they desire.

I am not quite sure how an export format doesn't address this desire to have full access to the data they collected.

@k8macarthur All the metadata you mentioned are already in the H5OINA.

The format also includes a lot more metadata. Actually, it should have all the metadata shown in our software.

k8macarthur commented 4 years ago

@ppinard Do you know when the new H5OINA will actually be available?

k8macarthur commented 4 years ago

So I received the updated Aztec software during training yesterday which finally has the H5OINA export option. The only problem is I can't open in even my HDF5 reader so I'm not how we even begin to create a reader for Hyperspy. Any suggestions or help really appreciated.

k8macarthur commented 4 years ago

Application Training STO DSO Site 9 Map Data 2.zip

ericpre commented 4 years ago

You can use h5py to read the file: https://docs.h5py.org/en/stable/quick.html. The reader will most likely use h5py and parse the data, metadata, etc to a dictionary.

jat255 commented 4 years ago

Taking just a quick glance...

In [3]:  f = h5py.File('Application Training STO  DSO Site 9 Map Data 2.h5oina','r')
In [12]: f['1']['EDS']['Header'].keys()
#Out[12]: <KeysViewHDF5 ['Acquisition Date', 'Analysis Label', 'Analysis Unique Identifier', 'Beam Voltage', 'Binning', 'Channel Width', 'Detector Azimuth', 'Detector Elevation', 'Detector Serial Number', 'Detector Type Id', 'Drift Correction', 'Energy Range', 'Magnification', 'Number Channels', 'Number Frames', 'Process Time', 'Processor Type', 'Project File', 'Project Label', 'Project Notes', 'Site Label', 'Site Notes', 'Specimen Label', 'Specimen Notes', 'Stage Position', 'Start Channel', 'Strobe Area', 'Strobe FWHM', 'Tilt Angle', 'Window Type', 'Working Distance', 'X Cells', 'X Step', 'Y Cells', 'Y Step']>
In [16]: f['1']['EDS']['Header']['Beam Voltage'][:]
#Out[16]: array([200.], dtype=float32)
In [56]: plt.imshow(f['1']['Electron Image']['Data']['SE']['Electron Image 8'][:].reshape((1024,1024)))

image

(that reshaping might not be right, but it looks image-ish to me)

jat255 commented 4 years ago

@ppinard I would again like to reiterate my request that there be some way for this format to be produced by default. Many multi-user facilities are working towards automated data harvesting systems that allow users to search their data across many different instruments via web-portals with full metadata search, etc. This depends on being able to reliably extract metadata out of individual files. Knowing users (or at least our users), asking them to manually export every data file they collect is never going to happen, and I know the work of your team and ours will probably be for naught at least in our use-case since we cannot rely on user behavior.

Has your team considered an "auto-export" option, where every dataset can be exported in this format automatically, alongside the OIP? That would solve our problem, and making it an opt-in configurable option would save people who don't care from wasting their disk storage on the exported files.

ppinard commented 4 years ago

Sorry for the late reply. Much needed holiday after M&M.

@k8macarthur Glad to see that you were able to acquire some data (hopefully this means the system is operational...). A warning is the H5OINA does not contain the EDS data cube yet. It is scheduled for the next release. As soon as we have a prototype, I will post an example in this issue.

@jat255 The "auto-export" is a very good idea. I will add this in our development backlog. Right now the best way to export all acquisitions to H5OINA is to right-click on the Project root and select Export to H5OINA, as shown in the gif below

export_to_h5oina

jat255 commented 1 year ago

I (and/or some colleagues) might have some bandwidth to work on this in the coming months, since this is something we're aiming to support in NexusLIMS

hakonanes commented 1 year ago

FYI, in terms of OI's EBSD formats, kikuchipy has a reader for uncompressed binary .ebsp files (doc, source) and a bare minimum reader of patterns from their H5OINA files (doc, source). I plan to upstream most or all of our plugins to RoscettaSciIO when I have time...

k8macarthur commented 1 year ago

Great! I’m simultaneously working on data cube for EDS export from within OI too. I’m aim for the next main release after this one 6.2.

jeinsle commented 1 year ago

our group here has started a push on getting this functionalised as we can not afford enough Aztec liceneces. however we are running into some strange behaviours where we are not getting consistant HDF5 data structres from the same session. Anyway, I would be happy to get going on contributing our efforts to a branch here. That said I am still a total github barbarian.

jeinsle commented 1 year ago

I guess the first think for me to know is there a bracnh / issue / pull request that I should start putting stuff in against?

ppinard commented 1 year ago

we are running into some strange behaviours where we are not getting consistant HDF5 data structres from the same session

@jeinsle Could you give more details of what's different in the files between sessions?

jlaehne commented 1 year ago

I guess the first think for me to know is there a bracnh / issue / pull request that I should start putting stuff in against?

New pull requests are always branched of the main branch.

jeinsle commented 1 year ago

we are running into some strange behaviours where we are not getting consistant HDF5 data structres from the same session

@jeinsle Could you give more details of what's different in the files between sessions?

I have two maps made in the same session, but I get two different names for the element maps as seen in the picture below.

image

While I really only care about the raw spectra, we have a variety of users who still want to make their own element maps. As such we are writing rather elaborate loops to figure out if we are working with 'Peak Area' or with 'Window Integral'. This is fustrating as the data was collected in the same microscope session etc

jeinsle commented 1 year ago

@ppinard additionally, the 'spectrum' key seems to be compleatly missing, so in order for use to recreate a spectral map for use in hyperspy I still need to export as raw and then recombine with the metadata from the H5OINA file.

not clear if this is some kind of license issue, version of Aztec etc

k8macarthur commented 1 year ago

@jeinsle Hi Josh!, So the spectrum information (i.e. the data cube) is still not yet released. I was trying to get it out in my first year at OI but didn't quite succeed. It's all lined up to be available from 6.2 which should come out towards the end of this year. I've written some small imports into python using HDF5 reader and it seems to be working well. It looks like your other issue (the different names from same session) is likely occuring because you selected map or TruMap? Basically 'Window Integral' and 'Peak Area' refer to whether or not we applied curve fitting for the mapping or not. If you change your map selection then the name will change to reflect that. Obviously if you're not changing your map selection please let me know and I can look into it.

jeinsle commented 1 year ago

@k8macarthur Howdy Kate. great to hear from you. a) data cube that helps, though is confusing on the H5OINA documentation makes it sound like it is released. b) I did not run the sesssions so only so much trouble shooting I can do. it sounds like OIP file represents a combination of both TruMaps (manual run) and un-TruMap lol (automated overnight job). In both cases these are in the same session, but the users collected some data manually, and then would set up a big montage job for overnight. So by the file set up these all look like they should be the same and this catches us as a surprise. Is there are key in the HDF5 strutre that lets you know if a TruMap was collected or not? Also, should I assume the 'Peak Area' is a TruMap?

k8macarthur commented 1 year ago

@jeinsle
a) Yeah spectrum export is possible for feature and individual spectra but not for the full 3D data cube which is why it's not fully clear with the documentation.

b) Window Integral = Mapping (red maps in Aztec) Peak Area = TruMap (yellow maps in Aztec) Composition = QuantMap (green maps in Aztec)

If you want all maps to be TruMap by default this can be set in the User Preferences. Alternatively, if you acquire a TruMap manual acquisition and then set up the LAM is should run with matching settings. you should be able be able to reprocess all the automated data by re-editing one field and then copying those processing parameters to all the other fields.

Yuji-Tan commented 4 months ago

I guess file structure depends on version of AZtec. But .oip file seems to have SQLite format. Some data files have the same file size as image size (X x Y x 4 byte). It should be image data, but I failed to open when I assume data-type of integer or float. Micrograph-like image appeared, but it was not reasonable. It should be converted somehow to interpret.