Handle uneven bin widths in spectra

rebeccafair commented 2 years ago

Some operations, such as broadening by convolution, are not strictly correct when applied to data with uneven bin widths (irregularly sampled data). We should ensure these are handled sensibly.

rebeccafair commented 2 years ago

An example of this (obviously incorrect!) behaviour using tests_and_analysis/test/data/qpoint_frequencies/quartz/toy_quartz_qpoint_frequencies.json. The data only contains 2 q-points for simplicity, and the dispersion looks like this: Using ebins=np.concatenate((np.arange(0, 15, 0.1), np.arange(15, 40, 0.2))) (so, energy bins of width 0.1 meV in the 0-15 meV interval, and width 0.2 in the 15-40 meV interval), using calculate_dos you get: Ok, looks reasonable. So what if we apply broadening of 1 meV by convolution? Oh no! Horrible artefacts and it obviously doesn't look right. Now create a DOS with wide bins (1 meV) as an approximation for what the broadened spectrum should look like: Looks much more reasonable. And in fact, we can correctly apply broadening using the mode_widths argument to calculate_dos, and setting all the widths to 1meV. Then you get: So in fact the code to correctly apply broadening is already there, so maybe we can fix this with just a bit of refactoring.

Script to reproduce the above:

import matplotlib.pyplot as plt
import numpy as np

from euphonic import ureg, QpointFrequencies
from euphonic.plot import plot_1d

qpt_freqs = QpointFrequencies.from_json_file(
        'tests_and_analysis/test/data/qpoint_frequencies/quartz/toy_quartz_qpoint_frequencies.json')
disp_fig = plot_1d(qpt_freqs.get_dispersion())

ebins = np.concatenate((np.arange(0, 15, 0.1),
                        np.arange(15, 40, 0.2)))*ureg('meV')

# Plain DOS with uneven bins
dos = qpt_freqs.calculate_dos(ebins)
dos_fig = plot_1d(dos)

# Broaden with convolution - oh no!
dosb = dos.broaden(1*ureg('meV'))
dosb_fig = plot_1d(dosb)

# Use wide energy bins as an approximation of what the broadened
# spectrum should look like
ebins_broad = np.arange(0, 40, 1)*ureg('meV')
dos_broad_bins = qpt_freqs.calculate_dos(ebins_broad)
dos_broad_bins_fig = plot_1d(dos_broad_bins)

# Use mode_widths to explicitly broaden each mode by 1 meV instead
mode_widths = np.ones(qpt_freqs.frequencies.shape)*ureg('meV')
dos_explicit = qpt_freqs.calculate_dos(ebins, mode_widths)
dos_explicit_fig = plot_1d(dos_explicit)

plt.show()

ajjackson commented 2 years ago

I think for the specific case of Spectrum2D data along high-symmetry band structure paths, in which each path between turning-points has constant spacing, there is an algorithm we can employ for "correct" broadening:

Break the spectrum into segments of constant sampling
For each segment:
- Pad the data slightly beyond the plotting range. At a Brillouin Zone boundary this can be done by copying "reflected" data from inside the region.
- Broaden along q, with a fixed kernel generated on this q-spacing
- Trim off the padding
Recombine the segments to form the broadened Spectrum2D

The case that keeps me up at night is where a high-symmetry point is not at a zone boundary but lies in the middle of another line. In that case we could not do the reflection trick and would need to actually create or interpolate appropriately-spaced data for the padded region.

rebeccafair commented 2 years ago

I worry that this approach could be a lot of work for a specific case, do people generally actually care about completely correct broadening along q in a bandstructure? For publication quality plots, maybe, but in that case they may want to create their own set of Spectrum2Ds over each direction, broaden them, then stick them together to make a single Spectrum2D, as I doubt the automatically generated bandstructure would be the one that they want anyway. Horace has 'spaghetti' plots, it would be interesting to know how the broadening is handled there, whether all the q-bins are the same or not. In any case I don't think this is a high priority issue at the moment.

ajjackson commented 2 years ago

I don't think it would be that much work, but agree that it is not really a priority. I'm happy to go ahead without it for now; if people are using these things then maybe it's worth refining.

The differences would mostly only be significant for large q-broadening and I don't have much sense right now of what values are typical.

pace-neutrons / Euphonic

Handle uneven bin widths in spectra #201