Did lots of timing of functions. I noticed that timing from bash includes launching the Python interpreter which takes 0.8s. Will exclude that and do a %timeit %run pyse.py --detection=10 --analysis=10 --grid=128 --bmaj=0.001414711 --bmin=0.001111111 --bpa=0 FITS_files/SOURCESINSERTED_2230_2230.FITS from the iPython prompt from now on. I noticed that loading the FITS image and header takes about 40 ms. Computing background characteristics takes about 130 ms. These are pretty much optimized. Also applying numba.jit decorators to anything within extract.py either does not work because the function is too complicated for Numba or gives negligible speedup. The core of fitting is a scipy.optimize.leastsq which is already pretty optimized, but still a lot slower than computing source profiles from moments. Real speedup is achieved by turning off Gauss fitting, which is included here. Now the typical image takes 3.0s, so that is without Gauss fitting, but the speedup is considerable. Additional speedup can be achieved by replacing Pool.map, which uses multiprocessing, by a threading pool. This would, however require some way of releasing the GIL within extract.py. Since Numba seems unable to do this, one might have to revert to Cython. Dask bags are not used, so I removed the import.
Did lots of timing of functions. I noticed that timing from bash includes launching the Python interpreter which takes 0.8s. Will exclude that and do a %timeit %run pyse.py --detection=10 --analysis=10 --grid=128 --bmaj=0.001414711 --bmin=0.001111111 --bpa=0 FITS_files/SOURCESINSERTED_2230_2230.FITS from the iPython prompt from now on. I noticed that loading the FITS image and header takes about 40 ms. Computing background characteristics takes about 130 ms. These are pretty much optimized. Also applying numba.jit decorators to anything within extract.py either does not work because the function is too complicated for Numba or gives negligible speedup. The core of fitting is a scipy.optimize.leastsq which is already pretty optimized, but still a lot slower than computing source profiles from moments. Real speedup is achieved by turning off Gauss fitting, which is included here. Now the typical image takes 3.0s, so that is without Gauss fitting, but the speedup is considerable. Additional speedup can be achieved by replacing Pool.map, which uses multiprocessing, by a threading pool. This would, however require some way of releasing the GIL within extract.py. Since Numba seems unable to do this, one might have to revert to Cython. Dask bags are not used, so I removed the import.