Some feedback - Githubissues

magnunor commented 7 years ago

Some thoughts in semi-random order, while reading https://github.com/uellue/opixtem/wiki/Requirements

File formats

HDF5 is the obvious solution here, due to being a matured, widely used and well-supported file format. I suggest using the NCEM EMD style "internal" HDF5 structure, since it is the most widely used at the moment.

HyperSpy has support for reading and writing EMD, and it is very easy to extend or tweak. The good thing about HDF5, is that it can be read lazily using for example dask (https://dask.pydata.org/). The data can be processed without loading the full file into memory. This is especially important for pixelated STEM data, due to it potentially being very large. HDF5 has native compression support, so this will also help with keeping file sizes more manageable.

Another important factor for this is the chunking value for the HDF5-files: https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html. (Potentially a bit technical, but anyone who is implementing this needs to think about these things).

Lastly, there are potentially some IO-bottlenecks when using HDF5: loading the same data in a flat binary file can be quicker. This might be solved by using parallel HDF5, since I suspect a reason for the slower loading is partly CPU-limited. More info on this: https://support.hdfgroup.org/HDF5/PHDF5/

File conversion

HyperSpy can convert DM3/DM4 to HDF5/EMD. This also extends to any file HyperSpy can load. So if we can load a file in HyperSpy, we can also save it as HDF5/EMD. We in Glasgow got code for converting Medipix3 binary data to HDF5/EMD. We'll release this soonish.

Data processing

The Glasgow group has software for a variety of data processing on this type of data:

Analysis of STEM-differential phase contrast data
Radial integration, and analysis of HOLZ data
Virtual ADF, BF, ...
Other tools enabling easy visualization of the large datasets

Most of this is based on HyperSpy, both for the data processing and the visualization. Especially the lazy loading/processing part mentioned above is vital for this, since at some point the data will be too large for any standard computer.

Interfacing with detectors

I made a Python library for interfacing with a Medipix3 detector (through the Merlin readout system): https://fast_pixelated_detectors.gitlab.io/merlin_interface/. This is done over TCP/IP.

I also made a library for getting live data from a Medipix3 detector (via the Merlin readout system): https://fast_pixelated_detectors.gitlab.io/fpd_live_imaging/. It currently only works for the Medipix3, but it could work for any type of detector, as long as it is possible to get the image data somehow. For the Merlin readout system, this is all done using TCP/IP.

I think being able to interface (control and get data) through TCP/IP for more equipment would make everything so much easier. Since currently everything has to go through specific vendor software, which reduces the possibility to innovate.

Performance

I think it is a good idea to implement as much as possible in interpreted languages, like Python, since this greatly reduces the barriers of entry for researchers to participate. While pure python can be relatively slow, more optimized libraries such as NumPy is essentially very optimized C code.

For acquiring data, my fpd_live_imaging works fine on 1000 fps, and I'll test it on 12000 fps soon. This also includes a PyQt UI for visualization.

So I don't think necessarily Python will be a problem, as long as the correct libraries are used. And if something is too slow in Python (and no relevant library exists), it can be written using Cython.

For GPU calculations, I'd aim for using things like OpenCL (instead of vendor specific, like CUDA). Especially since the cross platform solutions are more future-proof.

Graphical user interface

Possible user interface for post processing could be http://hyperspy.org/hyperspyUI/

magnunor commented 7 years ago

With regards to file format:

Seems that the neutron, X-ray, and muon community also wants to use HDF5: http://download.nexusformat.org/doc/html/introduction.html#nexus-introduction

uellue commented 6 years ago

@magnunor NeXus is right on the money, I think! I was reading into it a bit, and the whole thing can be adapted almost 1:1 to electrons. Plus, they have all the utilities and whatnot!

ediff commented 6 years ago

Agreed, I used Nexus during my Synchrotron days and it is an excellent format more or less tailor made for our needs.

On 24 Jan 2018, at 14:31, Dieter Weber notifications@github.com wrote:

@magnunor https://github.com/magnunor NeXus is right on the money, I think! I was reading into it a bit, and the whole thing can be adapted almost 1:1 to electrons. Plus, they have all the utilities and whatnot!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uellue/opixtem/issues/1#issuecomment-360192087, or mute the thread https://github.com/notifications/unsubscribe-auth/ACjiSR-mBQSIctzCmXfd1hFxfgD4r_sOks5tN1rngaJpZM4Pq4dS.

magnunor commented 6 years ago

it seems pretty good, especially if it makes collaboration with related fields easier. In essence, I think NeXus is simply a HDF5 with a specific internal structure? So essentially like EMD, just with a different internal structure? Any other differences?

To be honest I don't really care so much about the internal metadata structure, as long as we can easily write readers/writer (for example using h5py).

uellue commented 6 years ago

Simple NeXus files are quite straightforward. Minimal example: http://download.nexusformat.org/doc/html/introduction.html#simple-example

The main difference to existing file formats for EM is that they already implement a number of things that were on our wishlist:

Clear, modular namespace
Organization and process to keep extending and developing the format http://www.nexusformat.org/NIAC.html
Wide support in various software packages (only neutrons and photons for now, of course)
Schema language to define application-specific data arrangements http://download.nexusformat.org/doc/html/nxdl.html
Wide range of existing definitions: http://download.nexusformat.org/doc/html/classes/applications/index.html
Automated validation tools http://download.nexusformat.org/doc/html/utilities.html#validation
Machine-readable definitions of detectors that supports complex geometries if necessary. For example, you could describe exactly where STEM detector segments are, and any software that supports such detector definitions can work directly with your data. http://download.nexusformat.org/doc/html/classes/base_classes/NXdetector.html
Full-blown coordinate system to precisely describe position and orientation of sample, beam, detectors, optical elements and so forth http://download.nexusformat.org/sphinx/design.html#nexus-coordinate-systems
Description of various optical elements. Still missing are, of course, definitions for electron-optical elements. By adding these, one can get a precise, machine-readable definition of the entire electron optics from the gun to the detectors. http://download.nexusformat.org/doc/html/classes/base_classes/index.html
Description of the sample, including in-situ conditions. For EM and APT we can consider adding a complete atomic model. http://download.nexusformat.org/doc/html/classes/base_classes/NXsample.html

A NeXus file can contain enough information to feed your electron optics definition and sample definition into a simulation package and directly compare your experimental result with the simulation, without any additional information.

By using NeXus, we can save ourselves a HUGE amount of conceptual work and painstaking definitions, and we can focus directly on the applications instead.

uellue commented 6 years ago

In terms of Hyperspy, any existing Hyperspy file can be mapped directly onto NeXus. They have quite similar definitions for axes, multidimensional data and so forth.

uellue / opixtem

Some feedback #1

File formats

File conversion

Data processing

Interfacing with detectors

Performance

Graphical user interface