leylabmpi / resmico

Identifying misassemblies via deep learning
MIT License
11 stars 1 forks source link

Missing samtools/Incompatible OpenSSL #21

Open ohickl opened 2 years ago

ohickl commented 2 years ago

Hi,

installed ResMiCo to a conda env like this:

mamba create -n resmico pip -c conda-forge
mamba activate resmico
pip install resmico

When I try to run bam2feat, I get the following error:

$ conda activate resmico

$ resmico bam2feat --outdir "${resmico_dir}/features" \
                   "${resmico_dir}/resmico_map.tsv"
2022-09-16 15:31:34.228594: W tensorflow/stream_executor/platform/default/dso_loader.cc:64]
Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0:
cannot open shared object file: No such file or directory
2022-09-16 15:31:34.228654: I tensorflow/stream_executor/cuda/cudart_stub.cc:29]
Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File ".../envs/resmico/bin/resmico", line 8, in <module>
    sys.exit(main())
  File ".../envs/resmico/lib/python3.10/site-packages/resmico/__main__.py", line 52, in main
    args.func(args)
  File ".../envs/resmico/lib/python3.10/site-packages/resmico/commands/bam2feat.py", line 80, in main
    bam2feat.main(args)
  File ".../envs/resmico/lib/python3.10/site-packages/resmico/bam2feat.py", line 273, in main
    raise OSError(f'Cannot find executable: {k}')
OSError: Cannot find executable: samtools

I tried also to add samtools (and bowtie2) to the env, but it seems like the pysam version you use needs OpenSSL 3.0.5, whereas samtools (htslib) needs OpenSSL 1.1.1q.

So with samtools, since it downgraded OpenSSL, I then get:

Traceback (most recent call last):
  File ".../envs/resmico/bin/resmico", line 5, in <module>
    from resmico.__main__ import main
  File ".../envs/resmico/lib/python3.10/site-packages/resmico/__main__.py", line 5, in <module>
    from resmico.commands import bam2feat
  File ".../envs/resmico/lib/python3.10/site-packages/resmico/commands/bam2feat.py", line 6, in <module>
    from resmico import bam2feat
  File ".../envs/resmico/lib/python3.10/site-packages/resmico/bam2feat.py", line 17, in <module>
    import pysam
  File ".../envs/resmico/lib/python3.10/site-packages/pysam/__init__.py", line 4, in <module>
    from pysam.libchtslib import *
ImportError: libcrypto.so.3: cannot open shared object file: No such file or directory

Or am I wrong and this could be some strange interaction on my side?

nick-youngblut commented 2 years ago

Please use the following install method:

mamba env create -n resmico_env -f $RESMICO_BASE_DIR/environment.yml
mamba activate resmico_env
pip install resmico

I've updated the README to match. Feel free to re-open the issue if you still experience problems

ohickl commented 2 years ago

Thanks for the update! I have installed using these instructions, which ran fine except for this:

...
Installing collected packages: cmake, protobuf, resmico
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.1
    Uninstalling protobuf-3.20.1:
      Successfully uninstalled protobuf-3.20.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
This behaviour is the source of the following dependency conflicts.
google-api-core 2.10.1 requires protobuf<5.0.0dev,>=3.20.1, but you have protobuf 3.20.0 which is incompatible.
Successfully installed cmake-3.24.1.1 protobuf-3.20.0 resmico-1.1.1

Not sure, if that matters.

It would then crash at the bam2feat part:

...
2022-09-19 15:33:32,702 -   CMD: samtools sort -@ 30 -o resmico-bam2feat_TMP/test/mg/mapping_sorted_sub_sorted.bam resmico-bam2feat_TMP/test/mg/mapping_sorted_sub.bam
2022-09-19 15:34:09,831 -   CMD: samtools index -@ 30 resmico-bam2feat_TMP/test/mg/mapping_sorted_sub_sorted.bam
2022-09-19 15:34:22,831 - Outdir: .../resmico/features/test/mg
2022-09-19 15:34:22,831 -   CMD: .../resmico/lib/python3.9/site-packages/resmico/bam2feat --procs 30 -queue_size 32 --window 6 -breakpoint_margin 50 --o .../resmico/features/test/mg --bam_file resmico-bam2feat_TMP/test/mg/mapping_sorted_sub_sorted.bam --fasta_file resmico-bam2feat_TMP/test/mg/test_mg_contigs.fasta

.../resmico/lib/python3.9/site-packages/resmico/bam2feat: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by .../resmico/lib/python3.9/site-packages/resmico/bam2feat)

Traceback (most recent call last):
  File ".../resmico/bin/resmico", line 8, in <module>
    sys.exit(main())
  File ".../resmico/lib/python3.9/site-packages/resmico/__main__.py", line 52, in main
    args.func(args)
  File ".../resmico/lib/python3.9/site-packages/resmico/commands/bam2feat.py", line 80, in main
    bam2feat.main(args)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 277, in main
    feat_files = run_bam2feat(bam_fasta, exe, args)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 250, in run_bam2feat
    return [x for x in res]
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 250, in <listcomp>
    return [x for x in res]
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 222, in _run_bam2feat
    bam2feat(bam_sub, ref_tmp, outdir_feat, exe['bam2feat'], args)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 186, in bam2feat
    run_cmd(cmd)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 102, in run_cmd
    raise ValueError('Return code: {}'.format(rc))
ValueError: Return code: 1

which I fixed by setting export LD_LIBRARY_PATH="${CONDA_PREFIX}/lib". Could be a problem of the system I am using though.

Now it crashes a bit later:

...
[2022-09-21 13:01:19.251] [svc] [ESC[32minfoESC[m] Processing contig: NODE_1933212_length_275_cov_1.282828
[2022-09-21 13:01:19.251] [svc] [ESC[32minfoESC[m] Getting per-read characteristics
[2022-09-21 13:01:19.279] [svc] [ESC[32minfoESC[m] Processing contig: NODE_2778987_length_247_cov_0.864706
[2022-09-21 13:01:19.279] [svc] [ESC[32minfoESC[m] Getting per-read characteristics
[2022-09-21 13:01:19.305] [svc] [ESC[32minfoESC[m] Processing contig: NODE_1691562_length_289_cov_1.056604
[2022-09-21 13:01:19.305] [svc] [ESC[32minfoESC[m] Getting per-read characteristics
[2022-09-21 13:01:22.874] [svc] [ESC[32minfoESC[m] Processing contig: NODE_1_length_139176_cov_7.769344
[2022-09-21 13:01:22.874] [svc] [ESC[32minfoESC[m] Getting per-read characteristics

Traceback (most recent call last):
  File ".../resmico/bin/resmico", line 8, in <module>
    sys.exit(main())
  File ".../resmico/lib/python3.9/site-packages/resmico/__main__.py", line 52, in main
    args.func(args)
  File ".../resmico/lib/python3.9/site-packages/resmico/commands/bam2feat.py", line 80, in main
    bam2feat.main(args)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 277, in main
    feat_files = run_bam2feat(bam_fasta, exe, args)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 250, in run_bam2feat
    return [x for x in res]
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 250, in <listcomp>
    return [x for x in res]
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 222, in _run_bam2feat
    bam2feat(bam_sub, ref_tmp, outdir_feat, exe['bam2feat'], args)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 186, in bam2feat
    run_cmd(cmd)
  File ".../resmico/lib/python3.9/site-packages/resmico/bam2feat.py", line 102, in run_cmd
    raise ValueError('Return code: {}'.format(rc))
ValueError: Return code: -11

Contig NODE_1_length_139176_cov_7.769344 looks fine, as far as I can tell from a superficial check.

roachjm-unc commented 2 years ago

This is the same thing I am seeing on #22. If you run the CMD outside of python, you'll get a segfault in bam2feat. I was thinking that was related to the length of the particular contig in my test example, but in this case, this contig is 139k bps as opposed to 25k in my example, so that seems unlikely.

I also modify the LD_LIBRARY_PATH because bam2feat linking also falls through to system libstdc++ which is also too old for GLIBCXX_3.4.26. The only difference is that I do that on the run line, not full export, i.e.:

LD_LIBRARY_PATH=${MINICONDA3}/envs/resmico_env/lib:$LD_LIBRARY_PATH resmico bam2feat \ --outdir features REAL-TEST/map.tsv

but essentially the same thing. So that could also be related.

Can I ask, what OS is bam2feat compiled in?

danieldanciu commented 2 years ago

bam2feat was compiled on Linux (Debian and Ubuntu) and Mac OS (Intel).

ohickl commented 2 years ago

Thanks for having a look! Any news on what could be the issue?