Problem: resubmissions of possibly months-long simulations must not depend on the repo and the build directory, because both will change when updating the code and recompiling. We already copy the executable in Schedule.py, but we don't copy all executables of the pipeline yet. Discussed with @knelli2 today.
Specifically, when generating initial data with --evolve we need to copy executables for the evolution and ringdown to the run directory. This isn't too difficult because we know at initial data generation time if we want to evolve, so we can just copy the executables and use the new paths in the Next section of the ID input file. We just have to make sure the executables are statically linked (this is a separate issue and is necessary anyway).
Also, we have to copy Python dependencies for the resubmission scripts. This shouldn't be too difficulty either by using pip: we do python -m pip venv $RUN_DIR/env [--system-site-packages] to create a Python env in the run directory and then $RUN_DIR/bin/pip install $BUILD_DIR/bin/python/ to install our Python code.
Dynamically linked executables are still susceptible to the same issue. If the build dir is deleted or changed, the runs may break.
The executables can be large, especially if built with static libraries, and so extraneous copying will significantly increase required disk space. This has been an issue with SpEC.
We are still at the mercy of dynamic system libraries, so need to make a decision on how static we want our executables to be. A possible pain point: if we update our third party libs, if we link them dynamically then we can never delete or change existing modules. This has been an issue with SpEC.
We need to actually be able to update the code for simulations that crash where new features are implemented that resolve the reason the code crashed. There are of course complications with serialization here, but even ignoring those we need the bookkeeping capabilities. This is a big challenge for BFI/SpEC projects.
Could make it easy for ourselves and just require that for full automation, you have to have statically linked execs. Not sure if it's worth our time to engineer a solution for dynamically linked execs. This may make things harder to debug? But if it's only for production-level runs maybe it's ok?
Yes we do need to be careful about copying around large binaries. But isn't this almost unavoidable if we want to have 1) runs that don't depend on our build dir and 2) reproducibility. Not sure what we should do about this
Could model what we do after your (@nilsdeppe) work for statically linking the CCE exec. That may be overkill for our purposes though. Guess it's a question of how confident we are that sysadmins won't switch/delete things up on us.
Isn't this accomplished by just building a different static exec and using that instead? I would think how we do this in an "automatic" way is the harder question, and that I don't know. (oof yeah serialization)
Yes, statically linked would be the way to go. I'd probably try to link a lot statically, just because for any sufficiently long simulation it will break. I think generally sys admins have been okay with not breaking things, but we've definitely had a few times where we had several hundred simulations crash because of a module change. I agree, we shouldn't spend time trying to support dynamically linked execs since the solution will likely be "copy all shared libs we link". I don't see static libs making it harder to debug things, likely the opposite actually :)
Yes and no. The issue is that in SpEC we for a long time copied the executable into every segment. This was an issue and we now have 1 executable for much more of the code. We should check with Mark, but likely we want 1 bin directory for all Ecc iterations, levs, segments, etc. and everything just calls the executables in that. SpEC does various sym links to try to make each segment look independent, but this I think has resulted in a lot of very complicated scripts. I hope we can just have one bin directory for each set of parameters.
Yep! I think we can probably just turn a lot of the CMake flags on that enable static linking for different libs. Again, sysadmins are generally pretty good, but it's also nice to be safe :)
Yes, but it turns out that in SpEC the execs are stored in several locations, and so sometimes not all are updated correctly. This is why in point 2 I'm emphasizing having only one location for execs/scripts. That actually makes things a lot easier because everything must read from there and so any changes there immediately propagate. I'm not sure if we have to worry about different resolutions being able to use different execs, but naively any change for one resolution should be fine for others...
Problem: resubmissions of possibly months-long simulations must not depend on the repo and the build directory, because both will change when updating the code and recompiling. We already copy the executable in
Schedule.py
, but we don't copy all executables of the pipeline yet. Discussed with @knelli2 today.Specifically, when generating initial data with
--evolve
we need to copy executables for the evolution and ringdown to the run directory. This isn't too difficult because we know at initial data generation time if we want to evolve, so we can just copy the executables and use the new paths in theNext
section of the ID input file. We just have to make sure the executables are statically linked (this is a separate issue and is necessary anyway).Also, we have to copy Python dependencies for the resubmission scripts. This shouldn't be too difficulty either by using pip: we do
python -m pip venv $RUN_DIR/env [--system-site-packages]
to create a Python env in the run directory and then$RUN_DIR/bin/pip install $BUILD_DIR/bin/python/
to install our Python code.