Decide on scope and plan for development

@uellue @sk1p @pc494 this is the first step in the discussion we had in Trondheim about making a common repository for simulation code. If people are still interested it would be good to share ideas on what would be good to have here.

I'm definitely interested! Thanks a lot for doing this. :-) In the LiberTEM project we generally like to focus on doing the number crunching well and leave the "science part" to people and projects that understand more about it. In that sense, a perfect match.

In terms of interfacing, the LiberTEM blobfinder likes a list of positions in frame coordinates where to look for spots/disks. The zero order peak should be the first one, or it should be a separate input value. Output would be refined positions, relative peak intensity measure (logarithmic a.u.) and peak quality measure (a.u.). Integrated intensity in absolute values (with/without background subtraction) could be implemented easily as well.

2D grid vectors (origin and two vectors a and b) in detector coordinates, and optionally indices in terms of these vectors would be useful as well. That would allow to find peaks rather easily, and the return value would be the list of peaks it found at these positions with refined position, relative intensity measure (a.u.) and peak quality (a.u.) together with refined values for origin, a and b, i.e. a strain map, based on a weighted least square optimization of the positions and peak qualities. The result looks like this: https://nbviewer.jupyter.org/github/LiberTEM/LiberTEM/blob/260b20a6856f81b4ae7af93a16b5af635a5b5fee/examples/strainmap-SiGe.ipynb

If you have a very efficient function that matches a template library against positions and intensities, we could add that to the user-defined function that performs the position refinement. Optionally, the function can have two parts: A less efficient one that performs a full match to find a general range of patterns that can apply, roughly matches the rotation, and finds which peaks in the pattern and in the template correspond. A fast one that uses these "calibrations" can then refine each frame. Here is an example how we can use clustering to first segment a scan into regions, then do a "full match" for each region, and then refine everything within that region with a fast match based on the full match: https://nbviewer.jupyter.org/github/LiberTEM/LiberTEM/blob/260b20a6856f81b4ae7af93a16b5af635a5b5fee/examples/strainmap-poly.ipynb

This is only doing the 2D strain map, but it should be clear how we could just "plug in" the template matching instead of the simple strain map.

On a separate subject, I am currently exploring a new direction regarding a standardized file format. After much frustration and pondering, my current thinking is that a simplified model / "digital twin" of the instrument is required to interpret metadata. That way, we first don't have to implement a model/corrector of everything (distortions, shifts, ...) ourselves, and second we'd make sure everyone has the same understanding what the metadata values actually mean. More details here: https://github.com/LiberTEM/nexus-4dstem/wiki/Requirements-analysis

What are your thoughts on this? How could this fit with pyxem and diffsims? As an example, I could imagine that diffsims just calculates diffraction angles and perhaps relative intensities, and the "digital twin" can map that to detector coordinates, including size and shape of bright field disk etc.

Lots to discuss!

I'm definitely interested! Thanks a lot for doing this. :-) In the LiberTEM project we generally like to focus on doing the number crunching well and leave the "science part" to people and projects that understand more about it. In that sense, a perfect match.

In terms of interfacing, the LiberTEM blobfinder likes a list of positions in frame coordinates where to look for spots/disks. The zero order peak should be the first one, or it should be a separate input value. Output would be refined positions, relative peak intensity measure (logarithmic a.u.) and peak quality measure (a.u.). Integrated intensity in absolute values (with/without background subtraction) could be implemented easily as well.

This will be pyxem-like functionality as there is no simulation involved.

2D grid vectors (origin and two vectors a and b) in detector coordinates, and optionally indices in terms of these vectors would be useful as well. That would allow to find peaks rather easily, and the return value would be the list of peaks it found at these positions with refined position, relative intensity measure (a.u.) and peak quality (a.u.) together with refined values for origin, a and b, i.e. a strain map, based on a weighted least square optimization of the positions and peak qualities. The result looks like this: https://nbviewer.jupyter.org/github/LiberTEM/LiberTEM/blob/260b20a6856f81b4ae7af93a16b5af635a5b5fee/examples/strainmap-SiGe.ipynb

Again pyxem.

If you have a very efficient function that matches a template library against positions and intensities, we could add that to the user-defined function that performs the position refinement. Optionally, the function can have two parts: A less efficient one that performs a full match to find a general range of patterns that can apply, roughly matches the rotation, and finds which peaks in the pattern and in the template correspond. A fast one that uses these "calibrations" can then refine each frame. Here is an example how we can use clustering to first segment a scan into regions, then do a "full match" for each region, and then refine everything within that region with a fast match based on the full match: https://nbviewer.jupyter.org/github/LiberTEM/LiberTEM/blob/260b20a6856f81b4ae7af93a16b5af635a5b5fee/examples/strainmap-poly.ipynb

This is only doing the 2D strain map, but it should be clear how we could just "plug in" the template matching instead of the simple strain map.

Again pyxem does this already.

On a separate subject, I am currently exploring a new direction regarding a standardized file format. After much frustration and pondering, my current thinking is that a simplified model / "digital twin" of the instrument is required to interpret metadata. That way, we first don't have to implement a model/corrector of everything (distortions, shifts, ...) ourselves, and second we'd make sure everyone has the same understanding what the metadata values actually mean. More details here: https://github.com/LiberTEM/nexus-4dstem/wiki/Requirements-analysis

This is an interesting idea and something I could imagine myself using in future - although probably by installing a copy of LiberTEM rather than putting it into code I am a part of maintaining.

What are your thoughts on this? How could this fit with pyxem and diffsims? As an example, I could imagine that diffsims just calculates diffraction angles and perhaps relative intensities, and the "digital twin" can map that to detector coordinates, including size and shape of bright field disk etc.

Yes, I think this final example is the where we had ended up on this. We are planning to use diffsims to generate the simulation bit that gets plugged into various other packages: pyxem, LiberTEM etc depending on who gets involved. I think this Issue was raised to figure out which simulations people would want/need. Should they be dynamic? Should we be wrapping other peoples code to do that? Do people want/need precession included. Those kind of questions.

Yes, the way I meant it was that diffsims can calculate these values for both LiberTEM and pyxem. I was just describing an interface for diffsims that would be convenient to use from our perspective.

Unifying the algorithms of pyxem and LiberTEM to keep the best of both and remove duplication would be a separate project. I could imagine moving relevant parts of our application code to pyxem and importing pyxem in LiberTEM for this, as soon as the pyxem dependencies allow this. In principle it would be feasible to use our UDFs on dask arrays, one "just" has to write some non-trivial wrappers to do this well. That would be the biggest work item to unify the application code IMO, since our application code is in many ways in a symbiotic relationship with the UDF interface. Numpy arrays already kind of work with a MemoryDataset that is part of our testing infrastructure.

I think this Issue was raised to figure out which simulations people would want/need. Should they be dynamic? Should we be wrapping other peoples code to do that? Do people want/need precession included. Those kind of questions.

Each person probably likes to include their personal favorite and may change their opinion on a regular basis as their work progresses. That would mean diffsims could provide a generic interface and some simple and fast implementation(s) that are easily available. Whoever likes something else can then write Issues or Pull Requests. As a very basic requirement, interfacing with relevant structure databases (reading CIF files?), positions and indices, the information if a reflection is forbidden, and some help with symmetries (applying them correctly to find equivalent positions, for example) would be a good start from my limited knowledge of crystallography. Then intensity from kinematic theory based on structure factors, which is relatively straightforward, right? I'd leave out things that are very resource-hungry or tricky for the beginning.

This is an interesting idea and something I could imagine myself using in future

:smile: :+1:

Precession sounds like something that is more instrument- and projection-specific than diffraction-specific. That could be a good addition for the "digital twin" project.

pyxem / diffsims

Decide on scope and plan for development #9