fit_lines_in_region() takes too much time

thomasorb commented 6 years ago

SN3 cube, 5 lines (Halpha + NII + SII), 2x2 binning on a 268x359 region i.e. 24000 pixels. It took ~2h with 4 kernels = 1.2s / pixel with a sincgauss model which is ~10 times slower than the old code (4e6 pixels in ~12 hours with 16 kernels i.e. <0.2s / pixel with a sinc model)

aa1

thomasorb commented 6 years ago

Fit time is returned by the fitting function and is obtained with the keyword fit_time. I have compared fitting times with a basic script

axis, spectrum, fit = cube.fit_lines_in_spectrum(919, 893, 3,
                                                 ('[NII]6548', 'Halpha', '[NII]6583',
                                                 '[SII]6716', '[SII]6731'),
                                                 fmodel='sincgauss',
                                                 pos_def='1',
                                                 pos_cov=-500,
                                                 fwhm_def='fixed',
                                                 sigma_def='1',
                                                 sigma_cov=10,
                                                 nofilter=True,
                                                 snr_guess='auto')
print fit['fit_time']

With 5 emission lines, shared velocity, shared broadening, no filter, fmodel=sincgauss and snr_guess='auto' it takes 1.0s to fit.

You must know that whit snr_guess set to 'auto', the fit is done two times, one time with a default SNR and the second time with a SNR computed from the standard deviation of the residual of the first fit.

When we change some of the parameters we get the following fitting time:

with snr_guess = None : 0.58s
with fmodel='sinc', snr_guess=None: 0.15s
with fmodel='gaussian', snr_guess=None: 0.14s

In conclusion, there is no difference with the old algorithm but the sincgauss model takes much more time to compute. I will check it it can be optimized but I'm not sure since it comes from the complexity of the Dawson function involved in the calculation: see https://doi.org/10.1093/mnras/stw2315.

thomasorb commented 6 years ago

I've timed the calculation of the simplest form of the sincgauss function and the complete one defined in orb.utils.spectrum.sincgauss1d with respect to the sinc function as it is defined in orb.utils.spectrum.sinc1d :

def sincgauss1d_test(x, h, a, dx, fwhm, sigma):

    width = np.fabs(fwhm) / orb.constants.FWHM_SINC_COEFF
    width /= np.pi

    a = sigma / math.sqrt(2) / width
    b = (x - dx) / math.sqrt(2) / sigma
    dawson1 = special.dawsn(1j * a + b) * np.exp(2.* 1j * a *b)
    dawson2 = special.dawsn(1j * a - b) * np.exp(-2. * 1j * a *b)
    dawson3 = 2. * special.dawsn(1j * a)

    return ((dawson1 + dawson2)/dawson3).real

import timeit
timer = timeit.Timer(stmt='orb.utils.spectrum.sincgauss1d(x, 0, 1, 500, 2, 2)',
                     setup='import numpy as np ;import orb.utils.spectrum ; x = np.arange(1000)')
print timer.timeit(500)
timer = timeit.Timer(stmt='sincgauss1d_test(x, 0, 1, 500, 2, 2)',
                     setup='import numpy as np ;import orb.utils.spectrum ; x = np.arange(1000)')
print timer.timeit(500)
timer = timeit.Timer(stmt='orb.utils.spectrum.sinc1d(x, 0, 1, 500, 2)',
                     setup='import numpy as np ;import orb.utils.spectrum ; x = np.arange(1000)')
print timer.timeit(500)

The simple script above returned

0.22s for 500 sincgauss1d
0.18s for 500 sincgauss1d_test
0.018s for 500 sinc1d i.e. sincgauss calculation is at least 10 times slower than sinc and even the complete sincgauss1d which uses the gvar library (http://pythonhosted.org/gvar/) and is only 20% less efficient than the simplest form.

thomasorb / orcs

fit_lines_in_region() takes too much time #9