Faster physical coordinates

Major speedup for processing large (4k*4k) images with > 1e5 sources by applying the guvectorize decorator. These images can now be processed - including loading of the FITS file - in less than a second on a single core of e.g. a AMD Ryzen 9 7900X.

A major overhaul of the data layout was needed to achieve this speedup. In summary, the speedup is achieved by doing all source processing in a vectorized way, this is much more efficient than any parallellization effort. Numba's guvectorize decorator not only implements vectorization on all the source processing in pixel coordinates, but also compiles the corresponding Python code. Also, all the conversions from pixel coordinates to celestial coordinates were done in a vectorized way using Astropy routines.

Note that you will see little of the claimed speedup in this branch, because all the measured source properties, which are stored in Numpy arrays, are finally passed into extract.ParamSet instances, which is the format that the unit tests accept. This is very time consuming. So to actually achieve the speedup, this conversion, which is the final step of source processing, has to be removed and simultaneously the unit tests will have to accommodate for any new format that stores source properties, e.g. xarray.

Also note that this guvectorized processing of sources only applies if the user does not opt for forced beam fits nor deblending. In these cases the speedup can be achieved, using moments only, no fitting in these cases yet. Hopefully this will be added later.
If the user does choose either deblending or forced beam fits, analytical Jacobians and bounds enhance and stabilize Gaussian fits.

transientskp / pyse

Faster physical coordinates #47