icecube / flarestack

Unbinned likelihood analysis code for astroparticle physics datasets
https://flarestack.readthedocs.io/en/latest/?badge=latest
MIT License
8 stars 7 forks source link

Mismatched number of trials when running on the DESY cluster #132

Closed smechbal closed 2 years ago

smechbal commented 2 years ago

When submitting a job to the DESY cluster, for large number of sources, the number of trials set in the minimisation dictionary does not match the one shown in INFO:flarestack.core.results, which leads to a failure of the DP/sensitivity analysis.

Can anyone please help?

Screenshot 2022-03-15 at 11 01 33
JannisNe commented 2 years ago

Hey, likely due to jobs crashing and trials not being produced. I would start with checking cluster logs. Also if jobs get aborted you should get an email sent to your DESY mail account. Also could you post the full error message? A smaller number of trials is not a reason per se why the code will fail.

smechbal commented 2 years ago

I am not getting any email to my DESY account, and the stderr cluster log reads (invariably): INFO:__main__:N CPU available 64. Using 1 INFO:flarestack.data.icecube.ic_season:Loading datasets from /lustre/fs22/group/icecube/data_mirror/ (DESY) INFO:flarestack.core.minimisation:Using 'standard_matrix' LLH class INFO:flarestack.core.injector:Initialising Injector for IC59 WARNING:flarestack.icecube_utils.dataset_loader:No field called 'good_i3' found in GoodRunList. Cannot check if all runs in GoodRunList are actually good. Traceback (most recent call last): File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/multiprocess_wrapper.py", line 164, in <module> run_multiprocess(n_cpu=cfg.n_cpu, mh_dict=mh_dict) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/multiprocess_wrapper.py", line 143, in run_multiprocess with MultiProcessor(n_cpu=n_cpu, mh_dict=mh_dict) as r: File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/multiprocess_wrapper.py", line 58, in __init__ inj = self.mh.get_injector(season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/minimisation.py", line 311, in get_injector self.seasons[season_name], self.sources File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/minimisation.py", line 1142, in add_injector return season.make_injector(sources, **self.inj_dict) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/data/__init__.py", line 327, in make_injector return MCInjector.create(self, sources, **inj_kwargs) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 210, in create return BaseInjector.subclasses[inj_name](season, sources, **inj_dict) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 435, in __init__ MCInjector.__init__(self, season, sources, **kwargs) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 239, in __init__ BaseInjector.__init__(self, season, sources, **kwargs) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 93, in __init__ self.spatial_pdf = SpatialPDF(kwargs["injection_spatial_pdf"], season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 18, in __init__ self.background = BackgroundSpatialPDF.create(spatial_pdf_dict, season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 237, in create return cls.subclasses[s_pdf_name](s_pdf_dict, season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 295, in __init__ self.bkg_f = self.create_background_function(season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 305, in create_background_function return load_bkg_spatial_spline(season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/utils/make_SoB_splines.py", line 597, in load_bkg_spatial_spline res = Pickle.load(f) ModuleNotFoundError: No module named 'scipy.interpolate._fitpack2'

On the output, the full error message points to a failure of the fitting routine when fitting the dp curve over overfluctuations. Here is the full message: `INFO:root:Estimated Discovery Potential is: 1.88e-08 GeV sr^-1 s^-1 cm^-2 {'name': 'analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/', 'mh_name': 'large_catalogue', 'dataset': <flarestack.data.icecube.ic_season.IceCubeDataset object at 0x7f3989dae760>, 'catalogue': '/lustre/fs23/group/icecube/smechbal/flarestackdata/input/catalogues/agn_cores/radioloud_irselected_north_1000brightest_srcs.npy', 'llh_dict': {'llh_name': 'standard_matrix', 'llh_sig_time_pdf': {'time_pdf_name': 'steady'}, 'llh_energy_pdf': {'energy_pdf_name': 'power_law'}, 'llh_spatial_pdf': {}, 'llh_bkg_time_pdf': {'time_pdf_name': 'steady'}}, 'inj_dict': {'injection_energy_pdf': {'energy_pdf_name': 'power_law', 'gamma': 2.0}, 'injection_sig_time_pdf': {'time_pdf_name': 'steady'}, 'injection_spatial_pdf': {}}, 'n_trials': 15, 'n_steps': 15, 'scale': 28.176657836865758} INFO:flarestack.cluster.submitter:Tue Mar 15 11:10:46 2022: qsub -t 1-15:1 /lustre/fs23/group/icecube/smechbal/flarestackdata/cluster/SubmitDESY.sh /lustre/fs23/group/icecube/smechbal/flarestack__data/input/analysis/analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/dict.pkl 1 INFO:flarestack.cluster.submitter:Creating file at /lustre/fs23/group/icecube/smechbal/flarestack__data/cluster/SubmitDESY.sh INFO:flarestack.cluster.submitter:Your job-array 86075079.1-15:1 ("SubmitDESY.sh") has been submitted

Done running MH analysis Nr of sources is 1000 gamma: 2.0 INFO:flarestack.core.results:Saving bias plot to /lustre/fs23/group/icecube/smechbal/flarestack__data/output/plots/analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/bias_n_s.pdf INFO:flarestack.core.results:Saving bias plot to /lustre/fs23/group/icecube/smechbal/flarestackdata/output/plots/analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/bias_gamma.pdf INFO:flarestack.core.results:Fraction of overfluctuations is 0.21 above 0.00 (N_trials=90) (Scale=0) INFO:flarestack.core.results:Fraction of overfluctuations is 0.89 above 0.00 (N_trials=9) (Scale=2.013) INFO:flarestack.core.results:Fraction of overfluctuations is 0.78 above 0.00 (N_trials=9) (Scale=4.025) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=6.038) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=8.05) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=10.06) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=12.08) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=14.09) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=16.1) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=18.11) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=20.13) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=22.14) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=24.15) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=26.16) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=28.18) INFO:flarestack.core.results:Sensitivity is 2.97e-09 INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 22.92 (N_trials=90) (Scale=0) INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 25 (N_trials=90) (Scale=0) INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 22.92 (N_trials=9) (Scale=2.013) INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 25 (N_trials=9) (Scale=2.013) INFO:flarestack.core.results:Fraction of overfluctuations is 0.11 above 22.92 (N_trials=9) (Scale=4.025) INFO:flarestack.core.results:Fraction of overfluctuations is 0.11 above 25 (N_trials=9) (Scale=4.025) INFO:flarestack.core.results:Fraction of overfluctuations is 0.56 above 22.92 (N_trials=9) (Scale=6.038) INFO:flarestack.core.results:Fraction of overfluctuations is 0.56 above 25 (N_trials=9) (Scale=6.038) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=8.05) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=8.05) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=10.06) INFO:flarestack.core.results:Fraction of overfluctuations is 0.89 above 25 (N_trials=9) (Scale=10.06) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=12.08) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=12.08) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=14.09) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=14.09) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=16.1) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=16.1) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=18.11) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=18.11) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=20.13) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=20.13) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=22.14) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=22.14) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=24.15) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=24.15) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=26.16) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=26.16) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=28.18) INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=28.18) WARNING:flarestack.core.results:RuntimeError for discovery potential!: Optimal parameters not found: Number of calls to function has reached maxfev = 800. WARNING:flarestack.core.results:RuntimeError for discovery potential!: Optimal parameters not found: Number of calls to function has reached maxfev = 800. INFO:flarestack.core.results:Discovery Potential is nan INFO:flarestack.core.results:Discovery Potential (TS=25) is nan Sens 2.972208549396039e-09 Sens_err [ 1.54376819e-09 -3.97857203e-08] 1.5437681926235718e-09 -3.9785720303686054e-08 Disc nan Disc_TS_threshold 22.916492451202796 Sens (n) 31.690630304160685 DP (n) nan 0 [3.421881549554388e-08] 1 [nan] Traceback (most recent call last): File "ir_selected_agn_analysis.py", line 290, in plt.tight_layout() File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py", line 2302, in tight_layout return gcf().tight_layout(pad=pad, h_pad=h_pad, w_pad=w_pad, rect=rect) File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/figure.py", line 3197, in tight_layout kwargs = get_tight_layout_figure( File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/tight_layout.py", line 320, in get_tight_layout_figure kwargs = _auto_adjust_subplotpars(fig, renderer, File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/tight_layout.py", line 82, in _auto_adjust_subplotpars bb += [ax.get_tightbbox(renderer, for_layout_only=True)] File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py", line 4628, in get_tightbbox bb_yaxis = self.yaxis.get_tightbbox( File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/axis.py", line 1103, in get_tightbbox ticks_to_draw = self._update_ticks() File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/axis.py", line 1045, in _update_ticks major_locs = self.get_majorticklocs() File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/axis.py", line 1277, in get_majorticklocs return self.major.locator() File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/ticker.py", line 2292, in call__ return self.tick_values(vmin, vmax) File "/lustre/fs23/group/icecube/smechbal/anaconda3/lib/python3.8/site-packages/matplotlib/ticker.py", line 2317, in tick_values raise ValueError( ValueError: Data has no positive values, and therefore can not be log-scaled.`

robertdstein commented 2 years ago

So I am seeing two problems. Let's address the stdout first:

The minimisation handler/result handler are independent, the mismatch itself is not a problem. The problem is that you do not have enough trials completed to calculate the discovery potential. As @JannisNe said, the root cause will not be flarestack itself, but rather that the jobs on the DESY cluster are not being completed. It is presumably because they are timing out or otherwise hitting some kill limit. @fbradascio ran into these problems A LOT.

On the stderr, that's very interesting. It seems to be that it is related to pickling. One possible problem is a mismatch between the version of scipy used to create the pickle, and the version used to read the pickle. I see we recently bumped to scipy 1.8.0 from 1.7.3. Perhaps try deleting the cache, reinstalling flarestack with a new environment, and then trying again?

robertdstein commented 2 years ago

Perhaps it might in any case be worth a new release of flarestack in any case, to account for these various version changes? What do you think @JannisNe?

JannisNe commented 2 years ago

I agree, a new version release is a good idea soon. A lot has changed plus the unblinding of the Accretion Flare Stacking is coming close for which a fixed version should be rolled out. Would be good if that version is not a major release but just a minor one to make the version number fixed for the analysis.

mlincett commented 2 years ago

It seems the problems mentioned in this issue have been explained, tentatively closing it.