Closed ManavalanG closed 1 year ago
Here is the relevant env definition file:
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- multiqc ==1.9
I attempted to create conda env (not singularity+conda) in cheaha to see if the issue can be replicated there. Conda just stalled at this step - Invoking SAT with clause count
even after 15mins of executing the command. Relevant parts of the stdout/stderr:
Solving environment...
When --verbose --verbose
flag added to probe further,
DEBUG conda.resolve:solve(1340): Requested specs:
- __archspec
- __glibc
- __linux
- __unix
- multiqc==1.9
DEBUG conda.resolve:solve(1341): Optional specs:
DEBUG conda.resolve:solve(1342): All other specs:
- _libgcc_mutex
- _openmp_mutex
........
........
........
DEBUG conda.resolve:solve(1343): missing specs:
- _libgcc_mutex
- _openmp_mutex
........
........
........
DEBUG conda.resolve:solve(1346): Solve: minimize removed packages
DEBUG conda.resolve:solve(1353): Solve: maximize versions of requested packages
DEBUG conda.common._logic:minimize(664): Empty objective, trivial solution
DEBUG conda.common._logic:minimize(664): Empty objective, trivial solution
DEBUG conda.resolve:solve(1357): Initial package channel/version metric: 0/0
DEBUG conda.resolve:solve(1360): Solve: minimize track_feature count
DEBUG conda.resolve:generate_feature_count(933): generate_feature_count returning with clause count: 128071
DEBUG conda.common._logic:minimize(658): Clauses added, recomputing solution
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 128071
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 128083
DEBUG conda.common._logic:minimize(745): Final sum objective: 0
DEBUG conda.resolve:solve(1363): Track feature count: 0
DEBUG conda.common._logic:minimize(658): Clauses added, recomputing solution
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 129436
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 129887
DEBUG conda.common._logic:minimize(745): Final sum objective: 0
DEBUG conda.resolve:solve(1374): Package misfeature count: 0
DEBUG conda.resolve:solve(1377): Solve: maximize build numbers of requested packages
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 129888
DEBUG conda.common._logic:minimize(745): Final sum objective: 0
DEBUG conda.resolve:solve(1379): Initial package build metric: 0
DEBUG conda.resolve:solve(1382): Solve: prefer arch over noarch for requested packages
DEBUG conda.common._logic:minimize(664): Empty objective, trivial solution
DEBUG conda.resolve:solve(1384): Noarch metric: 0
DEBUG conda.resolve:solve(1388): Solve: minimize number of optional installations
DEBUG conda.common._logic:minimize(664): Empty objective, trivial solution
DEBUG conda.resolve:solve(1391): Optional package install metric: 0
DEBUG conda.resolve:solve(1394): Solve: minimize number of necessary upgrades
DEBUG conda.common._logic:minimize(664): Empty objective, trivial solution
DEBUG conda.resolve:solve(1397): Dependency update count: 0
DEBUG conda.resolve:solve(1400): Solve: maximize versions and builds of indirect dependencies. Prefer arch over noarch where equivalent.
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 132814
DEBUG conda.common._logic:minimize(745): Final peak objective: 0
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 142063
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 132989
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 133651
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 133169
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 133389
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 133270
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 133344
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 133296
DEBUG conda.common._logic:minimize(745): Final peak objective: 39
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 13942923
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 7270918
DEBUG conda.common._logic:_run_sat(607): Invoking SAT with clause count: 3923857
As per this stackoverflow answer, Invoking SAT with clause count:
is known to be associated with stalling :(
conda can run into endless loops when trying to figure out a way to satisfy all the dependencies of the packages that are installed, along with the changes you want to apply. I've read somewhere that if it hasn't found a solution within two minutes, there's not much point in waiting any longer.
In general, you overcome such situations by being more specific about what you want. For example, specify exact versions instead of just package names. If you don't need conda-forge, avoid that channel, because it brings in a lot of metadata that the SAT resolver will try to consider.
Here's a huge blog article from Anaconda with ways to improve performance: https://www.anaconda.com/understanding-and-improving-condas-performance/
Interestingly, when conda stalls with message Invoking SAT with clause count
, it can't be killed using ctrl+C. Don't know why.
I tested with modified conda env config by adding python as a requirement and this resolved the issue. Full config:
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- python =3.6
- multiqc ==1.9
I'm not fully certain why but locking the python version makes the conda env creation process successful.
Potential workaround to this problem is to use a local modified snakemake wrapper, an approach we have successfully employed to resolve wrapper related issue in the small-var-caller pipeline.
mentioned in commit 7db6966ee0e3e969e3bc2e5196e4824b003473a6
mentioned in commit e0208acdc535572bdcb19e5c61f8e0ebd0d99c39
Initiated fresh singularity+conda env creation for all envs using the instructions from readme. The job took 3hr18min to complete. Job info:
JobID JobName NTasks MaxRSS Elapsed State NodeList ReqMem MaxVMSize AveVMSize Partition AllocTRES
------------ ---------- -------- ---------- ---------- ---------- --------------- ---------- ---------- ---------- ---------- ----------------------------------------
13110540 quac-2022+ 03:18:14 COMPLETED c0200 8192Mn short billing=1,cpu=1,mem=8192M,node=1
13110540.ba+ batch 1 2618.23M 03:18:14 COMPLETED c0200 8192Mn 3581.57M 3581.57M cpu=1,mem=8192M,node=1
13110540.ex+ extern 1 1.57M 03:18:14 COMPLETED c0200 8192Mn 188.57M 188.57M billing=1,cpu=1,mem=8192M,node=1
Attempted to compare with the past logs to see how long this task took in the past, but I couldn't find the relevant log file. In case time-taken keeps increasing in future, we may have to define conda env configs with specific dependencies. Currently it lists only the major tools and not any of the underlying dependencies.
Some background info and some learnings. We were aware of this issue before but we may need to act on this issue soon.
Note that we ran into variation of this issue here - https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline/-/issues/41
Testing now. Instead of using local wrapper, just define conda:
directive with wrapper:
and see if snakemake would use locally defined conda env. Advantage here is to avoid use of local wrapper.
mentioned in commit 9712962ace731f70b37ebebce939be58ff1ffe6e
Test was successful.
Note: The job to create singularity+conda environment for just multiqc env took less than 4mins to complete.
Rule multiqc_by_sample_initial_pass
ran into following error. I suspect this has to do with something wrong with conda env created. This reported error might be a variation of what we are seeing here.
Traceback (most recent call last):
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/bin/multiqc", line 6, in <module>
from multiqc.__main__ import multiqc
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/multiqc/__init__.py", line 16, in <module>
from .multiqc import run
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/multiqc/multiqc.py", line 38, in <module>
from .plots import table
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/multiqc/plots/table.py", line 9, in <module>
from multiqc.utils import config, report, util_functions, mqc_colour
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/multiqc/utils/mqc_colour.py", line 7, in <module>
import spectra
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/spectra/__init__.py", line 1, in <module>
from .core import COLOR_SPACES, Color, Scale
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/spectra/core.py", line 1, in <module>
from colormath import color_objects, color_conversions
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/colormath/color_conversions.py", line 13, in <module>
import networkx
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/networkx/__init__.py", line 81, in <module>
from networkx import algorithms
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/networkx/algorithms/__init__.py", line 81, in <module>
from networkx.algorithms import tree
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/networkx/algorithms/tree/__init__.py", line 1, in <module>
from .branchings import *
File "quac/.snakemake/conda/9c3107da6d16b7941e8a8041bae2d495/lib/python3.6/site-packages/networkx/algorithms/tree/branchings.py", line 30, in <module>
from dataclasses import dataclass, field
ModuleNotFoundError: No module named 'dataclasses'
Traceback (most recent call last):
File "quac/.snakemake/scripts/tmp6z72phym.wrapper.py", line 24, in <module>
"multiqc"
File "/mnt/snakemake/snakemake/shell.py", line 231, in __new__
raise sp.CalledProcessError(retcode, cmd)
To circumvent this error, I exported the env definition of conda env created for multiqc (from v1.0 quac) using command:
cd quac/1.0/.snakemake/conda
# identify env relevant to multiqc rules
grep multiqc *yaml
# now activate the corresponding conda env
conda activate full_path_to_conda
# export the env definition
conda env export > path_to_multiqc_env.yaml
mentioned in commit e48521b5ed1add1c2c0bb1fec104a8c027cf591d
mentioned in commit df9cde51060991beadd5bb7e966ae19b18116369
QuaC stalls when creating singularity+conda environments when it tries to create the env for multiqc.