spedas / pyspedas

Python-based Space Physics Environment Data Analysis Software
https://pyspedas.readthedocs.io/
MIT License
143 stars 58 forks source link

Improve time & memory performance of specplot? #857

Open jameswilburlewis opened 2 months ago

jameswilburlewis commented 2 months ago

To get specplot resampling working correctly on the various scenarios for spectra with varying bin sizes on the Y axis, we are basically mapping the input bins (vdata) onto a very high resolution (1 pixel) grid (vdata1), like this:

        fig_size = fig.get_size_inches() * fig.dpi
        ny = fig_size[1]
        vdata1 = (
            np.arange(0, ny, dtype=np.float64) * (ycrange[1] - ycrange[0]) / (ny - 1)
            + ycrange[0]
        )
        out_values1 = specplot_resample(out_values, vdata, vdata1)

This looks great, but it's rather slow, and we seem to be consuming a lot of memory to do it this way.

There might be a more efficient way to do this, since pcolormesh can accommodate non-uniform Y bin boundaries. What if we took vdata1 to be the union of all the bin boundaries for vdata, rather than a super hi-res 1 pixel grid?

So if one set of samples has bin boundaries:

0 20 40 65 88 100

and the rest of them have bin boundaries:

5 30 50 70 90 105

then we take vdata1 as:

0 5 20 30 40 50 65 70 88 90 100 105

and map to that, guaranteeing that each vdata bin exactly spans some set of vdata1 bins, with no further subdivision necessary.

Given that displays now routinely support 200 DPI or higher resolution, this could cut the processing time and memory requirements for plotting spectrograms by a factor of 10 or more.