Open magnunor opened 7 years ago
With regards to file format:
Seems that the neutron, X-ray, and muon community also wants to use HDF5: http://download.nexusformat.org/doc/html/introduction.html#nexus-introduction
@magnunor NeXus is right on the money, I think! I was reading into it a bit, and the whole thing can be adapted almost 1:1 to electrons. Plus, they have all the utilities and whatnot!
Agreed, I used Nexus during my Synchrotron days and it is an excellent format more or less tailor made for our needs.
On 24 Jan 2018, at 14:31, Dieter Weber notifications@github.com wrote:
@magnunor https://github.com/magnunor NeXus is right on the money, I think! I was reading into it a bit, and the whole thing can be adapted almost 1:1 to electrons. Plus, they have all the utilities and whatnot!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uellue/opixtem/issues/1#issuecomment-360192087, or mute the thread https://github.com/notifications/unsubscribe-auth/ACjiSR-mBQSIctzCmXfd1hFxfgD4r_sOks5tN1rngaJpZM4Pq4dS.
it seems pretty good, especially if it makes collaboration with related fields easier. In essence, I think NeXus is simply a HDF5 with a specific internal structure? So essentially like EMD, just with a different internal structure? Any other differences?
To be honest I don't really care so much about the internal metadata structure, as long as we can easily write readers/writer (for example using h5py).
Simple NeXus files are quite straightforward. Minimal example: http://download.nexusformat.org/doc/html/introduction.html#simple-example
The main difference to existing file formats for EM is that they already implement a number of things that were on our wishlist:
A NeXus file can contain enough information to feed your electron optics definition and sample definition into a simulation package and directly compare your experimental result with the simulation, without any additional information.
By using NeXus, we can save ourselves a HUGE amount of conceptual work and painstaking definitions, and we can focus directly on the applications instead.
In terms of Hyperspy, any existing Hyperspy file can be mapped directly onto NeXus. They have quite similar definitions for axes, multidimensional data and so forth.
Some thoughts in semi-random order, while reading https://github.com/uellue/opixtem/wiki/Requirements
File formats
HDF5 is the obvious solution here, due to being a matured, widely used and well-supported file format. I suggest using the NCEM EMD style "internal" HDF5 structure, since it is the most widely used at the moment.
HyperSpy has support for reading and writing EMD, and it is very easy to extend or tweak. The good thing about HDF5, is that it can be read lazily using for example dask (https://dask.pydata.org/). The data can be processed without loading the full file into memory. This is especially important for pixelated STEM data, due to it potentially being very large. HDF5 has native compression support, so this will also help with keeping file sizes more manageable.
Another important factor for this is the chunking value for the HDF5-files: https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html. (Potentially a bit technical, but anyone who is implementing this needs to think about these things).
Lastly, there are potentially some IO-bottlenecks when using HDF5: loading the same data in a flat binary file can be quicker. This might be solved by using parallel HDF5, since I suspect a reason for the slower loading is partly CPU-limited. More info on this: https://support.hdfgroup.org/HDF5/PHDF5/
File conversion
HyperSpy can convert DM3/DM4 to HDF5/EMD. This also extends to any file HyperSpy can load. So if we can load a file in HyperSpy, we can also save it as HDF5/EMD. We in Glasgow got code for converting Medipix3 binary data to HDF5/EMD. We'll release this soonish.
Data processing
The Glasgow group has software for a variety of data processing on this type of data:
Most of this is based on HyperSpy, both for the data processing and the visualization. Especially the lazy loading/processing part mentioned above is vital for this, since at some point the data will be too large for any standard computer.
Interfacing with detectors
I made a Python library for interfacing with a Medipix3 detector (through the Merlin readout system): https://fast_pixelated_detectors.gitlab.io/merlin_interface/. This is done over TCP/IP.
I also made a library for getting live data from a Medipix3 detector (via the Merlin readout system): https://fast_pixelated_detectors.gitlab.io/fpd_live_imaging/. It currently only works for the Medipix3, but it could work for any type of detector, as long as it is possible to get the image data somehow. For the Merlin readout system, this is all done using TCP/IP.
I think being able to interface (control and get data) through TCP/IP for more equipment would make everything so much easier. Since currently everything has to go through specific vendor software, which reduces the possibility to innovate.
Performance
I think it is a good idea to implement as much as possible in interpreted languages, like Python, since this greatly reduces the barriers of entry for researchers to participate. While pure python can be relatively slow, more optimized libraries such as NumPy is essentially very optimized C code.
For acquiring data, my
fpd_live_imaging
works fine on 1000 fps, and I'll test it on 12000 fps soon. This also includes a PyQt UI for visualization.So I don't think necessarily Python will be a problem, as long as the correct libraries are used. And if something is too slow in Python (and no relevant library exists), it can be written using Cython.
For GPU calculations, I'd aim for using things like OpenCL (instead of vendor specific, like CUDA). Especially since the cross platform solutions are more future-proof.
Graphical user interface
Possible user interface for post processing could be http://hyperspy.org/hyperspyUI/