Open AlexKurek opened 5 months ago
Short answer: no. My Fortran knowledge is rusty and certainly not up to par with modern standards. However, before trying to optimize code manually, have you tried to let the compiler do the optimizations for you. In other words, how did you compile the code that you profiled?
I just installed PyBDSF using pip install git+https://github.com/lofar-astron/PyBDSF.git
. First I used cProfile
to find slowest functions, then line_profiler
to profile within those functions, which looks like this:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
632 @line_profiler.profile
633 def copy_gaussians(img, isl1, isl2):
634 """Copies Gaussians from isl2 to isl1
635
636 img.gaussians is also updated
637 """
638 31 148.5 4.8 0.3 if img.ngaus == 0:
639 gaus_id = -1
640 else:
641 31 105.1 3.4 0.2 gaus_id = img.gaussians[-1].gaus_num
642 65 28.1 0.4 0.1 for g in isl2.gaul:
643 34 11.6 0.3 0.0 gaus_id += 1
644 34 45805.1 1347.2 98.3 gcp = Gaussian(img, g.parameters[:], isl1.island_id, gaus_id)
645 34 9.6 0.3 0.0 gcp.gaus_num = gaus_id
646 34 12.3 0.4 0.0 gcp.jlevel = g.jlevel
647 34 14.8 0.4 0.0 isl1.gaul.append(gcp)
648 34 378.7 11.1 0.8 img.ngaus += 1
649 34 100.5 3.0 0.2 img.gaussians.append(gcp)
Do you think the compiler is capable of doing the vectorizations that well?
I'm confused. AFAICS Gaussian
is a class written in pure Python. I don't see a link with the Fortran snippet you referred to earlier. It may very well be that you found a hot-spot in copy_gaussians()
that can be improved, but that's hard for me to tell by quickly glancing over the code (which I didn't write BTW 😄). It's also not completely clear to me what exactly you're profiling.
I'm confused. AFAICS Gaussian is a class written in pure Python. I don't see a link with the Fortran snippet you referred to earlier.
No, it is not, that is a different bottleneck.
It's also not completely clear to me what exactly you're profiling.
I profiled an entire run on a small FITS image.
It may very well be that you found a hot-spot in copy_gaussians() that can be improved.
FYI: this is how this class looks when profiled:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
936 @line_profiler.profile
937 def __init__(self, img, gaussian, isl_idx, g_idx, flg=0):
938 """Initialize Gaussian object from fitting data
939
940 Parameters:
941 img: PyBDSM image object
942 gaussian: 6-tuple of fitted numbers
943 isl_idx: island serial number
944 g_idx: gaussian serial number
945 flg: flagging (if any)
946 """
947 458 1109.3 2.4 0.2 from . import functions as func
948 458 1124.8 2.5 0.2 from .const import fwsig
949 458 222.0 0.5 0.0 import numpy as N
950
951 # Add attribute definitions needed for output
952 458 3947.7 8.6 0.7 self.source_id_def = Int(doc="Source index", colname='Source_id')
953 458 2647.0 5.8 0.5 self.code_def = String(doc='Source code S, C, or M', colname='S_Code')
954 458 2198.5 4.8 0.4 self.gaus_num_def = Int(doc="Serial number of the gaussian for the image", colname='Gaus_id')
955 458 2284.2 5.0 0.4 self.island_id_def = Int(doc="Serial number of the island", colname='Isl_id')
956 458 2197.6 4.8 0.4 self.flag_def = Int(doc="Flag associated with gaussian", colname='Flag')
957 458 2487.5 5.4 0.4 self.total_flux_def = Float(doc="Total flux density, Jy", colname='Total_flux', units='Jy')
958 916 2242.1 2.4 0.4 self.total_fluxE_def = Float(doc="Total flux density error, Jy", colname='E_Total_flux',
959 458 68.7 0.2 0.0 units='Jy')
960 916 2236.5 2.4 0.4 self.peak_flux_def = Float(doc="Peak flux density/beam, Jy/beam", colname='Peak_flux',
961 458 67.9 0.1 0.0 units='Jy/beam')
962 916 2189.3 2.4 0.4 self.peak_fluxE_def = Float(doc="Peak flux density/beam error, Jy/beam",
963 458 76.9 0.2 0.0 colname='E_Peak_flux', units='Jy/beam')
964 916 6217.4 6.8 1.1 self.centre_sky_def = List(Float(), doc="Sky coordinates of gaussian centre",
965 458 150.0 0.3 0.0 colname=['RA', 'DEC'], units=['deg', 'deg'])
966 916 5131.7 5.6 0.9 self.centre_skyE_def = List(Float(), doc="Error on sky coordinates of gaussian centre",
967 458 127.0 0.3 0.0 colname=['E_RA', 'E_DEC'], units=['deg', 'deg'])
968 916 5134.0 5.6 0.9 self.centre_pix_def = List(Float(), doc="Pixel coordinates of gaussian centre",
969 458 141.0 0.3 0.0 colname=['Xposn', 'Yposn'], units=['pix', 'pix'])
970 916 5030.1 5.5 0.9 self.centre_pixE_def = List(Float(), doc="Error on pixel coordinates of gaussian centre",
971 458 133.0 0.3 0.0 colname=['E_Xposn', 'E_Yposn'], units=['pix', 'pix'])
972 916 4949.9 5.4 0.9 self.size_sky_def = List(Float(), doc="Shape of the gaussian FWHM, PA, deg",
973 458 165.2 0.4 0.0 colname=['Maj', 'Min', 'PA'], units=['deg', 'deg',
974 'deg'])
975 916 4918.2 5.4 0.9 self.size_skyE_def = List(Float(), doc="Error on shape of the gaussian FWHM, PA, deg",
976 458 139.7 0.3 0.0 colname=['E_Maj', 'E_Min', 'E_PA'], units=['deg', 'deg',
977 'deg'])
978 916 4966.7 5.4 0.9 self.deconv_size_sky_def = List(Float(), doc="Deconvolved shape of the gaussian FWHM, PA, deg",
979 458 127.1 0.3 0.0 colname=['DC_Maj', 'DC_Min', 'DC_PA'], units=['deg', 'deg',
980 'deg'])
981 916 4907.4 5.4 0.9 self.deconv_size_skyE_def = List(Float(), doc="Error on deconvolved shape of the gaussian FWHM, PA, deg",
982 458 123.9 0.3 0.0 colname=['E_DC_Maj', 'E_DC_Min', 'E_DC_PA'], units=['deg', 'deg',
983 'deg'])
984 916 4853.6 5.3 0.9 self.size_sky_uncorr_def = List(Float(), doc="Shape in image plane of the gaussian FWHM, PA, deg",
985 458 121.3 0.3 0.0 colname=['Maj_img_plane', 'Min_img_plane', 'PA_img_plane'], units=['deg', 'deg',
986 'deg'])
987 916 4985.4 5.4 0.9 self.size_skyE_uncorr_def = List(Float(), doc="Error on shape in image plane of the gaussian FWHM, PA, deg",
988 458 125.0 0.3 0.0 colname=['E_Maj_img_plane', 'E_Min_img_plane', 'E_PA_img_plane'], units=['deg', 'deg',
989 'deg'])
990 916 5015.5 5.5 0.9 self.deconv_size_sky_uncorr_def = List(Float(), doc="Deconvolved shape in image plane of the gaussian FWHM, PA, deg",
991 458 118.7 0.3 0.0 colname=['DC_Maj_img_plane', 'DC_Min_img_plane', 'DC_PA_img_plane'], units=['deg', 'deg',
992 'deg'])
993 916 4936.5 5.4 0.9 self.deconv_size_skyE_uncorr_def = List(Float(), doc="Error on deconvolved shape in image plane of the gaussian FWHM, PA, deg",
994 458 128.2 0.3 0.0 colname=['E_DC_Maj_img_plane', 'E_DC_Min_img_plane', 'E_DC_PA_img_plane'], units=['deg', 'deg',
995 'deg'])
996 458 2137.0 4.7 0.4 self.rms_def = Float(doc="Island rms Jy/beam", colname='Isl_rms', units='Jy/beam')
997 458 2105.7 4.6 0.4 self.mean_def = Float(doc="Island mean Jy/beam", colname='Isl_mean', units='Jy/beam')
998 458 2714.3 5.9 0.5 self.total_flux_isl_def = Float(doc="Island total flux from sum of pixels", colname='Isl_Total_flux', units='Jy')
999 458 2132.5 4.7 0.4 self.total_flux_islE_def = Float(doc="Error on island total flux from sum of pixels", colname='E_Isl_Total_flux', units='Jy')
1000 458 2091.6 4.6 0.4 self.gresid_rms_def = Float(doc="Island rms in Gaussian residual image", colname='Resid_Isl_rms', units='Jy/beam')
1001 458 2048.9 4.5 0.4 self.gresid_mean_def= Float(doc="Island mean in Gaussian residual image", colname='Resid_Isl_mean', units='Jy/beam')
1002 458 2093.8 4.6 0.4 self.sresid_rms_def = Float(doc="Island rms in Shapelet residual image", colname='Resid_Isl_rms', units='Jy/beam')
1003 458 2073.2 4.5 0.4 self.sresid_mean_def= Float(doc="Island mean in Shapelet residual image", colname='Resid_Isl_mean', units='Jy/beam')
1004 458 2193.4 4.8 0.4 self.jlevel_def = Int(doc="Wavelet number to which Gaussian belongs", colname='Wave_id')
1005 458 2146.3 4.7 0.4 self.spec_indx_def = Float(doc = "Spectral index", colname='Spec_Indx', units=None)
1006 458 2066.4 4.5 0.4 self.e_spec_indx_def = Float(doc = "Error in spectral index", colname='E_Spec_Indx', units=None)
1007 458 5359.7 11.7 0.9 self.specin_flux_def = List(Float(), doc = "Total flux density per channel, Jy", colname=['Total_flux'], units=['Jy'])
1008 458 5074.1 11.1 0.9 self.specin_fluxE_def = List(Float(), doc = "Error in total flux density per channel, Jy", colname=['E_Total_flux'], units=['Jy'])
1009 458 5005.9 10.9 0.9 self.specin_freq_def = List(Float(), doc = "Frequency per channel, Hz", colname=['Freq'], units=['Hz'])
1010
1011 458 102.4 0.2 0.0 use_wcs = True
1012 458 112.7 0.2 0.0 self.gaussian_idx = g_idx
1013 458 125.4 0.3 0.0 self.gaus_num = 0 # stored later
1014 458 96.2 0.2 0.0 self.island_id = isl_idx
1015 458 1621.6 3.5 0.3 self.jlevel = img.j
1016 458 95.8 0.2 0.0 self.flag = flg
1017 458 119.2 0.3 0.0 self.parameters = gaussian
1018
1019 458 72.4 0.2 0.0 p = gaussian
1020 458 142.1 0.3 0.0 self.peak_flux = p[0]
1021 458 172.3 0.4 0.0 self.centre_pix = p[1:3]
1022 458 124.7 0.3 0.0 size = p[3:6]
1023 916 5363.2 5.9 0.9 if func.approx_equal(size[0], img.pixel_beam()[0]*1.1) and \
1024 func.approx_equal(size[1], img.pixel_beam()[1]) and \
1025 func.approx_equal(size[2], img.pixel_beam()[2]+90.0) or \
1026 458 2068.8 4.5 0.4 img.opts.fix_to_beam:
1027 # Check whether fitted Gaussian is just the distorted pixel beam given as an
1028 # initial guess (always set to [bm_maj*1.1, bm_min, bm_pa+90]) or if size was
1029 # fixed to the beam. If so, reset the size to the undistorted beam. Note:
1030 # these are sigma sizes, not FWHM sizes.
1031 size = img.pixel_beam()
1032 size = (size[0], size[1], size[2]+90.0) # adjust angle so that corrected_size() works correctly
1033 458 2300.4 5.0 0.4 size = func.corrected_size(size) # gives fwhm and P.A.
1034 458 100.7 0.2 0.0 self.size_pix = size # FWHM in pixels and P.A. CCW from +y axis
1035
1036 # Use img.orig_beam for flux calculation and deconvolution on wavelet
1037 # images, as img.beam has been altered to match the wavelet scale.
1038 # Note: these are all FWHM sizes.
1039 458 1270.1 2.8 0.2 if img.waveletimage:
1040 40 392.1 9.8 0.1 bm_pix = N.array(img.beam2pix(img.orig_beam))
1041 else:
1042 418 2961.3 7.1 0.5 bm_pix = N.array(img.beam2pix(img.beam))
1043
1044 # Calculate fluxes, sky sizes, etc. All sizes are FWHM.
1045 458 673.3 1.5 0.1 tot = p[0]*size[0]*size[1]/(bm_pix[0]*bm_pix[1])
1046 458 106.4 0.2 0.0 if flg == 0:
1047 # These are good Gaussians
1048 446 28791.4 64.6 5.1 errors = func.get_errors(img, p+[tot], img.islands[isl_idx].rms, fixed_to_beam=img.opts.fix_to_beam)
1049 446 20891.4 46.8 3.7 self.centre_sky = img.pix2sky(p[1:3])
1050 446 101608.3 227.8 17.9 self.centre_skyE = img.pix2coord(errors[1:3], self.centre_pix, use_wcs=use_wcs)
1051 446 89407.0 200.5 15.7 self.size_sky = img.pix2gaus(size, self.centre_pix, use_wcs=use_wcs) # FWHM in degrees and P.A. east from north
1052 446 4037.4 9.1 0.7 self.size_sky_uncorr = img.pix2gaus(size, self.centre_pix, use_wcs=False) # FWHM in degrees and P.A. east from +y axis
1053 446 47263.1 106.0 8.3 self.size_skyE = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=use_wcs, is_error=True)
1054 446 3448.7 7.7 0.6 self.size_skyE_uncorr = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=False, is_error=True)
1055 446 6299.3 14.1 1.1 gaus_dc, err = func.deconv2(bm_pix, size)
1056 446 61356.6 137.6 10.8 self.deconv_size_sky = img.pix2gaus(gaus_dc, self.centre_pix, use_wcs=use_wcs)
1057 446 3714.9 8.3 0.7 self.deconv_size_sky_uncorr = img.pix2gaus(gaus_dc, self.centre_pix, use_wcs=False)
1058 446 47178.3 105.8 8.3 self.deconv_size_skyE = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=use_wcs, is_error=True)
1059 446 3418.6 7.7 0.6 self.deconv_size_skyE_uncorr = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=False, is_error=True)
1060 else:
1061 # These are flagged Gaussians, so don't calculate sky values or errors
1062 12 6.8 0.6 0.0 errors = [0]*7
1063 12 4.1 0.3 0.0 self.centre_sky = [0., 0.]
1064 12 3.6 0.3 0.0 self.centre_skyE = [0., 0.]
1065 12 4.2 0.3 0.0 self.size_sky = [0., 0., 0.]
1066 12 3.2 0.3 0.0 self.size_sky_uncorr = [0., 0., 0.]
1067 12 4.1 0.3 0.0 self.size_skyE = [0., 0.]
1068 12 3.0 0.2 0.0 self.size_skyE_uncorr = [0., 0., 0.]
1069 12 3.8 0.3 0.0 self.deconv_size_sky = [0., 0., 0.]
1070 12 3.6 0.3 0.0 self.deconv_size_sky_uncorr = [0., 0., 0.]
1071 12 3.1 0.3 0.0 self.deconv_size_skyE = [0., 0., 0.]
1072 12 2.8 0.2 0.0 self.deconv_size_skyE_uncorr = [0., 0., 0.]
1073 458 130.9 0.3 0.0 self.total_flux = tot
1074 458 185.7 0.4 0.0 self.total_fluxE = errors[6]
1075 458 139.6 0.3 0.0 self.peak_fluxE = errors[0]
1076 458 121.3 0.3 0.0 self.total_fluxE = errors[6]
1077 458 158.7 0.3 0.0 self.centre_pixE = errors[1:3]
1078 458 124.9 0.3 0.0 self.size_pixE = errors[3:6]
1079 458 1386.1 3.0 0.2 self.rms = img.islands[isl_idx].rms
1080 458 1235.0 2.7 0.2 self.mean = img.islands[isl_idx].mean
1081 458 1297.7 2.8 0.2 self.total_flux_isl = img.islands[isl_idx].total_flux
1082 458 1260.2 2.8 0.2 self.total_flux_islE = img.islands[isl_idx].total_fluxE
I dont understand the whole flow of PyBDSFs code, but maybe @cache
decorator would help for pix2coord
and pix2gaus
.
Maybe, but this is yet another potential hot-spot 😄. It's very well possible that there are numerous performance bottle-necks in PyBDSF. Unfortunately, I lack the time to dive into this. What's also not helping is that there's no reliable suite of unit tests, so I'm a bit hesitant to make changes to the code without knowing if I break something. Maybe @darafferty can give some insights here?
Interesting results -- as far as I know, this kind of profiling has never really been done. So I'm not surprised that there are lots of bottlenecks! The pytess_simple
code is I think very rarely used in practice (it is part of the the module that estimates the PSF variations across the image, and I'm aware of only a few people who have used it). But the copy_gaussians()
function is used in a lot of LOFAR processing (e.g., by the DDF pipeline). And yes, the lack of unit tests makes it difficult to be sure something doesn't break, but I guess it would be a monumental task to add them now.
So I would say that it would be worthwhile to experiment with improving some of the worst bottlenecks that Alex has identified. Perhaps for those at least we can write unit tests without too much work.
As far as I can remember, there were also issues with images that took forever or a very long time to calculate. This may be a good way to identify where PyBDSF stalls in such cases. But I would use statistical profiler for this.
Tagging @mhardcastle since you may be interested to see this.
BTW: for each --> Wavelet scale #
I see in the logs: -> Calculating background rms and mean images
. Does it have to be done for each scale anew? If I understand well, wavelets are larger, but the image is not decimated (?).
Yes, unfortunately I think it is necessary to recalculate the rms and mean maps for each scale, as they do change (as it turns out, this fact was relevant to the last PR that was just merged).
This is how it looks in general for 150 MB image:
Profile of rmsimage.py
:
rmsimage_prof.zip
In short: here line 265: mean, rms = self.calculate_maps(img, data, mean, rms, mask, map_opts, [...])
takes 86.1% of CPU time. In calculate_maps
self.map_2d
takes 99.7%. And there:
line 539: 32.4 % axes, mean_map1, rms_map1 = self.rms_mean_map(arr, mask_small, kappa, box, ncores)
line 548: 50.0 % axes2, mean_map2, rms_map2 = self.rms_mean_map(arr, mask, kappa, box2, ncores)
And there:
712 10 54550644.9 5e+06 40.9% cm_cr_list = mp.parallel_map(func.eval_func_tuple
717 10 75166237.6 8e+06 56.4 % cm_cr_list = mp.parallel_map(func.eval_func_tuple
Thanks for this work. We definitely need to follow up on this, but we do need to find the time for it.
Just a question / suggestion: this are the results of profiling:
Have you tried to vectorize loops there: https://github.com/lofar-astron/PyBDSF/blob/master/src/fortran/pytess_simple.f#L16C1-L29C15 ? ChatGPT seems to be doing this:
but Im bad at Fortran so I cannot verify if this is OK.