Closed smechbal closed 2 years ago
Hey, likely due to jobs crashing and trials not being produced. I would start with checking cluster logs. Also if jobs get aborted you should get an email sent to your DESY mail account. Also could you post the full error message? A smaller number of trials is not a reason per se why the code will fail.
I am not getting any email to my DESY account, and the stderr cluster log reads (invariably):
INFO:__main__:N CPU available 64. Using 1 INFO:flarestack.data.icecube.ic_season:Loading datasets from /lustre/fs22/group/icecube/data_mirror/ (DESY) INFO:flarestack.core.minimisation:Using 'standard_matrix' LLH class INFO:flarestack.core.injector:Initialising Injector for IC59 WARNING:flarestack.icecube_utils.dataset_loader:No field called 'good_i3' found in GoodRunList. Cannot check if all runs in GoodRunList are actually good. Traceback (most recent call last): File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/multiprocess_wrapper.py", line 164, in <module> run_multiprocess(n_cpu=cfg.n_cpu, mh_dict=mh_dict) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/multiprocess_wrapper.py", line 143, in run_multiprocess with MultiProcessor(n_cpu=n_cpu, mh_dict=mh_dict) as r: File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/multiprocess_wrapper.py", line 58, in __init__ inj = self.mh.get_injector(season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/minimisation.py", line 311, in get_injector self.seasons[season_name], self.sources File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/minimisation.py", line 1142, in add_injector return season.make_injector(sources, **self.inj_dict) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/data/__init__.py", line 327, in make_injector return MCInjector.create(self, sources, **inj_kwargs) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 210, in create return BaseInjector.subclasses[inj_name](season, sources, **inj_dict) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 435, in __init__ MCInjector.__init__(self, season, sources, **kwargs) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 239, in __init__ BaseInjector.__init__(self, season, sources, **kwargs) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/injector.py", line 93, in __init__ self.spatial_pdf = SpatialPDF(kwargs["injection_spatial_pdf"], season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 18, in __init__ self.background = BackgroundSpatialPDF.create(spatial_pdf_dict, season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 237, in create return cls.subclasses[s_pdf_name](s_pdf_dict, season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 295, in __init__ self.bkg_f = self.create_background_function(season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/core/spatial_pdf.py", line 305, in create_background_function return load_bkg_spatial_spline(season) File "/afs/ifh.de/user/s/smechbal/flarestack/flarestack/utils/make_SoB_splines.py", line 597, in load_bkg_spatial_spline res = Pickle.load(f) ModuleNotFoundError: No module named 'scipy.interpolate._fitpack2'
On the output, the full error message points to a failure of the fitting routine when fitting the dp curve over overfluctuations. Here is the full message: `INFO:root:Estimated Discovery Potential is: 1.88e-08 GeV sr^-1 s^-1 cm^-2 {'name': 'analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/', 'mh_name': 'large_catalogue', 'dataset': <flarestack.data.icecube.ic_season.IceCubeDataset object at 0x7f3989dae760>, 'catalogue': '/lustre/fs23/group/icecube/smechbal/flarestackdata/input/catalogues/agn_cores/radioloud_irselected_north_1000brightest_srcs.npy', 'llh_dict': {'llh_name': 'standard_matrix', 'llh_sig_time_pdf': {'time_pdf_name': 'steady'}, 'llh_energy_pdf': {'energy_pdf_name': 'power_law'}, 'llh_spatial_pdf': {}, 'llh_bkg_time_pdf': {'time_pdf_name': 'steady'}}, 'inj_dict': {'injection_energy_pdf': {'energy_pdf_name': 'power_law', 'gamma': 2.0}, 'injection_sig_time_pdf': {'time_pdf_name': 'steady'}, 'injection_spatial_pdf': {}}, 'n_trials': 15, 'n_steps': 15, 'scale': 28.176657836865758} INFO:flarestack.cluster.submitter:Tue Mar 15 11:10:46 2022: qsub -t 1-15:1 /lustre/fs23/group/icecube/smechbal/flarestackdata/cluster/SubmitDESY.sh /lustre/fs23/group/icecube/smechbal/flarestack__data/input/analysis/analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/dict.pkl 1 INFO:flarestack.cluster.submitter:Creating file at /lustre/fs23/group/icecube/smechbal/flarestack__data/cluster/SubmitDESY.sh INFO:flarestack.cluster.submitter:Your job-array 86075079.1-15:1 ("SubmitDESY.sh") has been submitted
Done running MH analysis
Nr of sources is 1000
gamma: 2.0
INFO:flarestack.core.results:Saving bias plot to /lustre/fs23/group/icecube/smechbal/flarestack__data/output/plots/analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/bias_n_s.pdf
INFO:flarestack.core.results:Saving bias plot to /lustre/fs23/group/icecube/smechbal/flarestackdata/output/plots/analyses/agn_cores/stacking_analysis_8yrNTsample_pre_unblinding/radioloud_irselected_north/NrSrcs=1000/2.0/bias_gamma.pdf
INFO:flarestack.core.results:Fraction of overfluctuations is 0.21 above 0.00 (N_trials=90) (Scale=0)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.89 above 0.00 (N_trials=9) (Scale=2.013)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.78 above 0.00 (N_trials=9) (Scale=4.025)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=6.038)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=8.05)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=10.06)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=12.08)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=14.09)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=16.1)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=18.11)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=20.13)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=22.14)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=24.15)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=26.16)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 0.00 (N_trials=9) (Scale=28.18)
INFO:flarestack.core.results:Sensitivity is 2.97e-09
INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 22.92 (N_trials=90) (Scale=0)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 25 (N_trials=90) (Scale=0)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 22.92 (N_trials=9) (Scale=2.013)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.00 above 25 (N_trials=9) (Scale=2.013)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.11 above 22.92 (N_trials=9) (Scale=4.025)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.11 above 25 (N_trials=9) (Scale=4.025)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.56 above 22.92 (N_trials=9) (Scale=6.038)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.56 above 25 (N_trials=9) (Scale=6.038)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=8.05)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=8.05)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=10.06)
INFO:flarestack.core.results:Fraction of overfluctuations is 0.89 above 25 (N_trials=9) (Scale=10.06)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=12.08)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=12.08)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=14.09)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=14.09)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=16.1)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=16.1)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=18.11)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=18.11)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=20.13)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=20.13)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=22.14)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=22.14)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=24.15)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=24.15)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=26.16)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=26.16)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 22.92 (N_trials=9) (Scale=28.18)
INFO:flarestack.core.results:Fraction of overfluctuations is 1.00 above 25 (N_trials=9) (Scale=28.18)
WARNING:flarestack.core.results:RuntimeError for discovery potential!: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
WARNING:flarestack.core.results:RuntimeError for discovery potential!: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
INFO:flarestack.core.results:Discovery Potential is nan
INFO:flarestack.core.results:Discovery Potential (TS=25) is nan
Sens 2.972208549396039e-09
Sens_err [ 1.54376819e-09 -3.97857203e-08] 1.5437681926235718e-09 -3.9785720303686054e-08
Disc nan
Disc_TS_threshold 22.916492451202796
Sens (n) 31.690630304160685
DP (n) nan
0 [3.421881549554388e-08]
1 [nan]
Traceback (most recent call last):
File "ir_selected_agn_analysis.py", line 290, in
So I am seeing two problems. Let's address the stdout first:
The minimisation handler/result handler are independent, the mismatch itself is not a problem. The problem is that you do not have enough trials completed to calculate the discovery potential. As @JannisNe said, the root cause will not be flarestack itself, but rather that the jobs on the DESY cluster are not being completed. It is presumably because they are timing out or otherwise hitting some kill limit. @fbradascio ran into these problems A LOT.
On the stderr, that's very interesting. It seems to be that it is related to pickling. One possible problem is a mismatch between the version of scipy used to create the pickle, and the version used to read the pickle. I see we recently bumped to scipy 1.8.0 from 1.7.3. Perhaps try deleting the cache, reinstalling flarestack with a new environment, and then trying again?
Perhaps it might in any case be worth a new release of flarestack in any case, to account for these various version changes? What do you think @JannisNe?
I agree, a new version release is a good idea soon. A lot has changed plus the unblinding of the Accretion Flare Stacking is coming close for which a fixed version should be rolled out. Would be good if that version is not a major release but just a minor one to make the version number fixed for the analysis.
It seems the problems mentioned in this issue have been explained, tentatively closing it.
When submitting a job to the DESY cluster, for large number of sources, the number of trials set in the minimisation dictionary does not match the one shown in
INFO:flarestack.core.results
, which leads to a failure of the DP/sensitivity analysis.Can anyone please help?