Closed lohedges closed 2 years ago
I saw a few of these too. I agree that it looks like an out of memory during the conda solve. It fixed itself after I waited a few hours...
This has now failed about 10 times over the course of 24 hours or so. I'm not sure what's going on. I can't imagine that the re-runs are using the same runner. Maybe I'll have to see if a fresh commit solves the problem.
It looks like we have a consistent failure for the Linux Python 3.7 build. See the most recent actions here. I'm still trying to figure out whether there is a simple solution to this issue.
This issue reports a similar failure. (Similar DEBUG messages seen in the output.) In this case, the failure was the result of a silent segmentation fault that was triggered when memory ran out.
Looking at the GitHub runner docs the Linux and macOS images have 7GB and 14GB of RAM, respectively. I would assume that this would be plenty. If this was the issue, then it's weird that only the Python 3.7 Linux variant is failing. (I guess some base package within the Python 3.7 conda environment might have a memory issue, which is fixed in later variants.)
I think the DEBUG messages are potentially misleading, since they occur on the successful CI runs too. The one that errors does so with exit code 137, so it's definitely a memory issue.
I've retried the build using both the Miniforge and Mambaforge variants of Miniconda (these can be enabled with the setup-miniconda action) and they both fail with the same memory error. I think the issue is that the failure is triggered by the dependency resolution during the conda-build
stage, i.e. even if you specify that mamba
should be prioritised, I don't think it would be used by conda-build
, only for regular conda install
type commands.
Not sure what to do about this. I'll poke around the docs to see if it's possible to tweak the runner's memory settings, or to add swap space. One option would be no longer supporting Python 3.7, although I'm not sure if any users are tied to this for other reasons, e.g. if other packages in their environment are only available for this variant. @msuruzhon: What Python variant do you use internally?
Hi @lohedges, we still use Python 3.7 internally - this version is still quite popular in other scientific libraries it seems.
Could mamba
not be installed and used as part of the CI? I find regular conda
unusable for installing larger packages, so I am not surprised there is a memory issue.
Yes, mamba
can be installed and used by the action. The issue is that it isn't used by the conda-build
command behind the scenes, which is what is used to build Sire and create the conda package. I wonder if there's a way to "trick it" to use mamba
, e.g. symlinking the conda
binary, or something. I'll have a play around locally to see if I can get something to work.
Ah yes sorry I didn't read properly. I have used boa
before during conda-build
to do that. It's technically "experimental" but when I tested it it was seamless, so you might want to try it. It basically uses mamba
as a resolver. You can find it here: https://github.com/mamba-org/boa.
Great, using boa
and conda mambabuild
has got past the memory error. I'll let the CI run to completion then close this assuming all is okay, i.e. that it's possible to build BioSimSpace on top of the resulting packages.
Phew!
Will need to debug since SireUnitTests are now failing against devel
, e.g. with the following:
Traceback (most recent call last):
File "/Users/runner/miniconda3/envs/sire_build/conda-bld/sire_1651680443805/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/lib/python3.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/runner/miniconda3/envs/sire_build/conda-bld/sire_1651680443805/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pla/share/Sire/test/SireUnitTests/unittests/SireMol/test_names_numbers.py", line 10, in test_names_numbers
assert mol.nResidues() == len(mol.residues().names())
UnboundLocalError: local variable 'mol' referenced before assignment
@chryswoods: I think this is due to additions for the feat_web
branch. (I also needed to add pytest
to the test
requirements section of our conda recipe.) Are any other changes to the tests imminent? Although your test has a check to see if an older version of Sire is being used it still appears to be failing. (Perhaps this is an issue of running the tests using sire_test
, rather than pytest
as you are presumably doing locally.) I also thought that tests for a feature should be placed in a matching feature branch on the SireUnitTests
repo, or is this no longer the case. (I know that you mentioned moving them into the main Sire repo, but what should we do until that time?)
Cheers.
Sorry about that. I thought I had masked things out correctly, but obviously not.
I have reverted the SireUnitTests repo back to its state from before I made changes to feat_web
. I have copied the files I needed to another directory, and will bring those into the main repo when I issue the pull request for feat_web
Yes, normally we should add tests to the corresponding feature branch. I was trying to do the more advanced step, which is adding the tests in a way that they don't run if used against an older version of Sire (in case someone wanted to run the test suite against a version they installed themselves). It was too complex, hence why I think we should move to putting the tests with the repo.
I think we should keep SireUnitTests though both for posterity, and also as a test that the code always supports the old API (run with sr.use_old_api()
).
Thanks for sorting this, I'll re-run the build now. To be honest, I'm not exactly sure why they failed, since it didn't happen with all of the new tests you added, despite the logic to check for the new API being the same in all cases.
The CI passed. Will now test BioSiimSpace, also building using conda mambabuild
for consistency.
Closing as everything is working as expected :+1:
I'm seeing fairly consistent build failures that exit with the following error:
(The failures might be for a different OS or Python variant, but the message will be similar.)
I've searched online and it's not clear what's triggering the error. (Possibly a memory or networking issue on the VM.) At present I'm seeing the same error repeatedly when trying to re-run the only failed job for the most recent build, i.e. Linux and Python 3.7. (Normally the issue is intermittent, so a simple re-run fixes things.)
Just thought I'd report here so we have a log of it. I'll see if I can figure out what's going on and will update the workflow file if needed. (It could just be a case of waiting and trying again later.)
Cheers.