icecube / flarestack

Unbinned likelihood analysis code for astroparticle physics datasets
https://flarestack.readthedocs.io/en/latest/?badge=latest
MIT License
8 stars 7 forks source link

Cluster scipy version mismatch (No module named 'scipy.interpolate._fitpack2') #181

Closed mlincett closed 2 years ago

mlincett commented 2 years ago

Describe the bug This is a follow-up on one of the issues raised in #132 and related to #35 .

Loading previously created spline pickles in make_SoB_splines.py fails if the spline pickles have been created with a more recent version of scipy.

The typical situation in which this occurs is when trials are firstly run from an up-to-date environment (such as a custom installation of python / flarestack) and then more trials are run on the cluster.

The jobs fail with:

  File "[...]/flarestack/flarestack/utils/make_SoB_splines.py", line 597, in load_bkg_spatial_spline
    res = Pickle.load(f)
ModuleNotFoundError: No module named 'scipy.interpolate._fitpack2'

To Reproduce Steps to reproduce the behavior:

  1. Install flarestack locally after preparing a conda environment with the prerequisites.
  2. Set up an analysis and run a few trials locally.
  3. Submit more trials to the cluster.

Expected behavior The cluster processing should not fail.

Additional context Proposed solution: tentatively, as long as we do not have a solution to ensure the proper environment consistency between the cluster and the user installation, we could catch the exception and trigger the recreation of the splines.

mlincett commented 2 years ago

A tentative fix is available as a bonus in PR https://github.com/icecube/flarestack/pull/180 .

robertdstein commented 2 years ago

So my understanding of #132 was not that it was directly related to the cluster, but that the pinned version of scipy required by flarestack changed and this was a breaking change for pickle. So I would expect the same problem with a local copy of flarestack, if you upgraded the scipy version.

The problem I see here is actually just that the python set up on the DESY cluster does not meet all the stated python requirements for flarestack.

I think the best thing would be to enforce the same version of python on cluster and locally. Is that possible? You could replace "python " with f"{sys.executable} " in https://github.com/icecube/flarestack/blob/a95be651a928ce6a9ea32c8f1245b4c434257a95/flarestack/cluster/make_desy_cluster_script.py#L49 provided the conda environment is also accessible to the cluster.

What do you think?

mlincett commented 2 years ago

I agree with your interpretation, I just meant to point out that this is bound to happen every time one runs trials with an updated local copy and then more trials on the cluster (that indeed has a fixed "old" python environment).

Triggering the recalculation of the splines as a response to the exception seem to be working but I agree the best course of action is try to run on the cluster with the same python environment of the "mother" script. I will see about testing what you suggest.

mlincett commented 2 years ago

Question: when launching N parallel jobs without previously creating the splines, will each job try to create the splines and write to the same file? If so, we should probably decouple this stage from the parallel processing.

JannisNe commented 2 years ago

Yes, this is the case. It would be very nice to have this decoupled indeed!

mlincett commented 2 years ago

I think the best thing would be to enforce the same version of python on cluster and locally. Is that possible?

I have tried your suggestions and it seems to work (jobs are still running so let's see).

Note that: flarestack.cluster.make_desy_cluster_script.py is actually a legacy file which is only referenced by old analyses. The generation is currently handled by flarestack.cluster.submitter.py.

I think I should test the same on the WIPAC cluster as well.