marbl / verkko

265 stars 27 forks source link

No module named networkx #258

Closed dmacguigan closed 4 weeks ago

dmacguigan commented 4 weeks ago

Hello,

I am running verkko v.2.1 from within a conda environment that I build using conda create -n verkko -c conda-forge -c bioconda -c defaults verkko. I'm running verkko on a computing cluster with the SLURM job scheduler.

Unfortunately, I encountered the following error.

Error in rule processGraph:
    jobid: 6
    input: 1-buildGraph/hifi-resolved.gfa, 1-buildGraph/paths.gaf, 1-buildGraph/hifi_nodecov.csv
    output: 2-processGraph/unitig-unrolled-hifi-resolved.gfa, 2-processGraph/unitig-mapping-1.txt
    log: 2-processGraph/process.err (check log file(s) for error details)
    shell:
dmacguig@vortex-future:~/vscratch/MacGuigan/genome_assemblies/Sander_vitreus_TJK-76/HERRO/verkko$ cat 2-processGraph/process
cat: 2-processGraph/process: No such file or directory
dmacguig@vortex-future:~/vscratch/MacGuigan/genome_assemblies/Sander_vitreus_TJK-76/HERRO/verkko$ cat 2-processGraph/process.err
Gap insertion pass 1.
Traceback (most recent call last):
  File "/projects/academic/tkrabben/modules_KrabLab/easybuild/2023.01/software/Core/miniconda3/22.11.1-1/envs/verkko/lib/verkko/scripts/insert_aln_gaps.py", line 4, in <module>
    import networkx as nx
ModuleNotFoundError: No module named 'networkx'

This error has been mentioned few times in the verkko Issues page. And it does seem that networkx is installed in my verkko conda environment. But somehow, the processGraph step can't see the module.

(verkko) dmacguig@vortex-future:~/vscratch/MacGuigan/genome_assemblies/Sander_vitreus_TJK-76/HERRO/verkko$ conda list
# packages in environment at /projects/academic/tkrabben/modules_KrabLab/easybuild/2023.01/software/Core/miniconda3/22.11.1-1/envs/verkko:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
amply                     0.1.6              pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
biopython                 1.83             py39hd1e30aa_0    conda-forge
boost-cpp                 1.78.0               h2c5509c_4    conda-forge
brotli-python             1.1.0            py39h3d6467e_1    conda-forge
bwa                       0.7.18               he4a0461_0    bioconda
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.6.2             hbcca054_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
coin-or-cbc               2.10.10              h9002f0b_0    conda-forge
coin-or-cgl               0.60.7               h516709c_0    conda-forge
coin-or-clp               1.17.8               h1ee7a9c_0    conda-forge
coin-or-osi               0.108.10             haf5fa05_0    conda-forge
coin-or-utils             2.11.11              hee58242_0    conda-forge
coincbc                   2.10.10           0_metapackage    conda-forge
configargparse            1.7                pyhd8ed1ab_0    conda-forge
connection_pool           0.0.3              pyhd3deb0d_0    conda-forge
datrie                    0.8.2            py39hd1e30aa_7    conda-forge
docutils                  0.21.2             pyhd8ed1ab_0    conda-forge
dpath                     2.1.6              pyha770c72_0    conda-forge
findutils                 4.6.0             h166bdaf_1001    conda-forge
gettext                   0.22.5               h59595ed_2    conda-forge
gettext-tools             0.22.5               h59595ed_2    conda-forge
gitdb                     4.0.11             pyhd8ed1ab_0    conda-forge
gitpython                 3.1.43             pyhd8ed1ab_0    conda-forge
graphaligner              1.0.19               h21ec9f0_0    bioconda
gsl                       2.7                  he838d99_0    conda-forge
htslib                    1.20                 h81da01d_0    bioconda
humanfriendly             10.0               pyhd8ed1ab_6    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
importlib_resources       6.4.0              pyhd8ed1ab_0    conda-forge
jemalloc                  5.2.0                he1b5a44_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
jsonschema                4.22.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
jupyter_core              5.7.2            py39hf3d152e_0    conda-forge
k8                        0.2.5                hdcf5f25_4    bioconda
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_1    conda-forge
libasprintf               0.22.5               h661eb56_2    conda-forge
libasprintf-devel         0.22.5               h661eb56_2    conda-forge
libblas                   3.9.0           22_linux64_openblas    conda-forge
libcblas                  3.9.0           22_linux64_openblas    conda-forge
libcurl                   8.8.0                hca28451_0    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libdivsufsort             2.0.2                h031d066_9    bioconda
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h77fa898_7    conda-forge
libgettextpo              0.22.5               h59595ed_2    conda-forge
libgettextpo-devel        0.22.5               h59595ed_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libgomp                   13.2.0               h77fa898_7    conda-forge
liblapack                 3.9.0           22_linux64_openblas    conda-forge
liblapacke                3.9.0           22_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.27          pthreads_h413a1c8_0    conda-forge
libprotobuf               3.14.0               h780b84a_0    conda-forge
libsqlite                 3.45.3               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               h4ab18f5_6    conda-forge
markupsafe                2.1.5            py39hd1e30aa_0    conda-forge
mashmap                   3.1.3                h07ea13f_0    bioconda
meryl                     1.4.1                h4ac6f70_0    bioconda
minimap2                  2.28                 he4a0461_1    bioconda
nbformat                  5.10.4             pyhd8ed1ab_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
networkx                  3.2.1              pyhd8ed1ab_0    conda-forge
numpy                     1.26.4           py39h474f0d3_0    conda-forge
openblas                  0.3.27          pthreads_h7a3da1a_0    conda-forge
openssl                   3.3.0                h4ab18f5_3    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
parasail-python           1.3.4            py39h4e691d4_1    bioconda
perl                      5.32.1          7_hd590300_perl5    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
plac                      1.4.3              pyhd8ed1ab_0    conda-forge
platformdirs              4.2.2              pyhd8ed1ab_0    conda-forge
protobuf                  3.14.0           py39he80948d_1    conda-forge
psutil                    5.9.8            py39hd1e30aa_0    conda-forge
pulp                      2.7.0            py39hf3d152e_1    conda-forge
pyparsing                 3.1.2              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.9.19          h0755675_0_cpython    conda-forge
python-fastjsonschema     2.19.1             pyhd8ed1ab_0    conda-forge
python_abi                3.9                      4_cp39    conda-forge
pyyaml                    6.0.1            py39hd1e30aa_1    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.35.1             pyhd8ed1ab_0    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
reretry                   0.11.8             pyhd8ed1ab_0    conda-forge
rpds-py                   0.18.1           py39ha68c5e3_0    conda-forge
samtools                  1.20                 h50ea8bc_0    bioconda
seqtk                     1.4                  he4a0461_2    bioconda
setuptools                70.0.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
smart_open                7.0.4              pyhd8ed1ab_0    conda-forge
smmap                     5.0.0              pyhd8ed1ab_0    conda-forge
snakemake-minimal         7.32.4             pyhdfd78af_1    bioconda
stopit                    1.1.2                      py_0    conda-forge
tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
throttler                 1.2.2              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toposort                  1.10               pyhd8ed1ab_0    conda-forge
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
typing_extensions         4.12.1             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
verkko                    2.1                  h45dadce_0    bioconda
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
winnowmap                 2.03                 h43eeafb_2    bioconda
wrapt                     1.16.0           py39hd1e30aa_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yte                       1.5.4              pyha770c72_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h4ab18f5_6    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

Any suggestions on how to proceed?

Thank you, Dan

skoren commented 4 weeks ago

Unfortunately, I haven't seen that before (the previous issues were when we didn't have networkx listed as a dependency in the recipe). The issue is the interaction of slurm and your conda environment so outside of verkko itself. I would guess the conda environment is somehow not active on the compute node trying to run the verkko command, thus missing networkx. Perhaps your cluster admins could advise?

dmacguigan commented 4 weeks ago

Thanks for the quick repley @skoren! I believe the problem is exactly what you described. I fixed it by including the full path to the python version included in my verkko conda environment. Might be worth mentioning this in the help documentation for cluster users.

verkko -d ${WD} \
 --hifi ${HERRO_READS} \
 --nano ${READS} \
 --hic1 ${HIC1} \
 --hic2 ${HIC2} \
 --slurm \
 --snakeopts '--use-conda --cluster "./slurm-sge-submit.sh {threads} {resources.mem_gb} {resources.time_h} {rulename} {resources.job_id} --partition=general-compute --account=tkrabben --qos=general-compute"' \
 --python '/projects/academic/tkrabben/modules_KrabLab/easybuild/2023.01/software/Core/miniconda3/22.11.1-1/envs/verkko/bin/python' \
 --perl '/projects/academic/tkrabben/modules_KrabLab/easybuild/2023.01/software/Core/miniconda3/22.11.1-1/envs/verkko/bin/perl' \
 --spl-run 1 8 24 # default runtime for this step is 96 hours, need to shorten it for UB cluster

However, I have now encountered another error, which seems to be a bug in the fix_haplogaps.py script.

Error executing rule processGraph on cluster (jobid: 6, external: 16135940, jobscript: /vscratch/grp-tkrabben/MacGuigan/genome_assemblies/Sander_vitreus_TJK-76/HERRO/verkko/.snakemake/tmp.b0p19ecu/verkko.processGraph.6.sh). For error details see the cluster log and the log files of the involved rule(s).
Exiting because a job execution failed. Look above for error message
(verkko) dmacguig@vortex-future:~/vscratch/MacGuigan/genome_assemblies/Sander_vitreus_TJK-76/HERRO/verkko/2-processGraph$ tail process.err
mend <446 >28439
mend <4587 <18523
mend <5313 >10338
mend >20820 >23078
mend >14466 >23294
mend >20567 >20568
Traceback (most recent call last):
  File "/projects/academic/tkrabben/modules_KrabLab/easybuild/2023.01/software/Core/miniconda3/22.11.1-1/envs/verkko/lib/verkko/scripts/fix_haplogaps.py", line 122, in <module>
    sys.stderr.write("can't fix " + key[0] + " " + key[1] + " due to overlap containing node (wanted " + str(wanted_gap_length) + ", node lengths " + len(node_seqs[key[0][1:]]) + ", " + len(node_seqs[key[1][1:]]) + ")")
TypeError: can only concatenate str (not "int") to str

After commenting out line 122 in fix_haplogaps.py, the pipeline proceeds as expected.

skoren commented 4 weeks ago

Thanks, I was going to add this to the documentation: "If you're using conda, you may need to make the conda-installed python your default before running.", do you think that is clear?

Thanks for the catch on the fix_haplogaps.py script, that does look like a bug. I think if those ints (the calls to len) were wrapped in str() it should also work but either way is OK since skipping the print won't affect functionality.

dmacguigan commented 4 weeks ago

No, thank you for this exciting pipeline!

That clarification looks good, but maybe this would be a bit more specific? "If you're using conda (especially in a cluster environment), you may need to make the conda-installed python your default. You can do this with the --python option when calling verkko."

skoren commented 4 weeks ago

OK updated, I left out the cluster part since I put it in the section on running on a grid. I suspect on a single node the conda environment would already be loaded.