Closed mlincett closed 2 years ago
A tentative fix is available as a bonus in PR https://github.com/icecube/flarestack/pull/180 .
So my understanding of #132 was not that it was directly related to the cluster, but that the pinned version of scipy required by flarestack changed and this was a breaking change for pickle. So I would expect the same problem with a local copy of flarestack, if you upgraded the scipy version.
The problem I see here is actually just that the python set up on the DESY cluster does not meet all the stated python requirements for flarestack.
I think the best thing would be to enforce the same version of python on cluster and locally. Is that possible? You could replace "python "
with f"{sys.executable} "
in https://github.com/icecube/flarestack/blob/a95be651a928ce6a9ea32c8f1245b4c434257a95/flarestack/cluster/make_desy_cluster_script.py#L49 provided the conda environment is also accessible to the cluster.
What do you think?
I agree with your interpretation, I just meant to point out that this is bound to happen every time one runs trials with an updated local copy and then more trials on the cluster (that indeed has a fixed "old" python environment).
Triggering the recalculation of the splines as a response to the exception seem to be working but I agree the best course of action is try to run on the cluster with the same python environment of the "mother" script. I will see about testing what you suggest.
Question: when launching N
parallel jobs without previously creating the splines, will each job try to create the splines and write to the same file? If so, we should probably decouple this stage from the parallel processing.
Yes, this is the case. It would be very nice to have this decoupled indeed!
I think the best thing would be to enforce the same version of python on cluster and locally. Is that possible?
I have tried your suggestions and it seems to work (jobs are still running so let's see).
Note that:
flarestack.cluster.make_desy_cluster_script.py
is actually a legacy file which is only referenced by old analyses. The generation is currently handled by flarestack.cluster.submitter.py
.
I think I should test the same on the WIPAC cluster as well.
Describe the bug This is a follow-up on one of the issues raised in #132 and related to #35 .
Loading previously created spline pickles in
make_SoB_splines.py
fails if the spline pickles have been created with a more recent version ofscipy
.The typical situation in which this occurs is when trials are firstly run from an up-to-date environment (such as a custom installation of python / flarestack) and then more trials are run on the cluster.
The jobs fail with:
To Reproduce Steps to reproduce the behavior:
conda
environment with the prerequisites.Expected behavior The cluster processing should not fail.
Additional context Proposed solution: tentatively, as long as we do not have a solution to ensure the proper environment consistency between the cluster and the user installation, we could catch the exception and trigger the recreation of the splines.