open2c / pairtools

Extract 3D contacts (.pairs) from sequencing alignments
MIT License
104 stars 32 forks source link

ImportError: cannot import name 'dedup_cython' from partially initialized module 'pairtools.lib' #146

Closed georgette-femerling closed 2 years ago

georgette-femerling commented 2 years ago

I am getting an import error when I try to run pairtools in a cluster:

Traceback (most recent call last):
  File "/home/gfemer/.local/bin/pairtools", line 5, in <module>
    from pairtools.cli import cli
  File "/home/gfemer/.local/lib/python3.8/site-packages/pairtools/cli/__init__.py", line 188, in <module>
    from . import (
  File "/home/gfemer/.local/lib/python3.8/site-packages/pairtools/cli/dedup.py", line 12, in <module>
    from ..lib import fileio, pairsam_format, headerops
  File "/home/gfemer/.local/lib/python3.8/site-packages/pairtools/lib/__init__.py", line 2, in <module>
    from . import dedup
  File "/home/gfemer/.local/lib/python3.8/site-packages/pairtools/lib/dedup.py", line 8, in <module>
    from . import dedup_cython, pairsam_format
ImportError: cannot import name 'dedup_cython' from partially initialized module 'pairtools.lib' (most likely due to a circular import) (/home/gfemer/.local/lib/python3.8/site-packages/pairtools/lib/__init__.py)

I'm using Python 3.8.10.

agalitsyna commented 2 years ago

Hi @georgette-femerling , how did you install pairtools? Have you tried

pip install numpy pysam cython
pip install pairtools

one after another?

aosakwe commented 2 years ago

Hi @agalitsyna , I am having the same issue and error message using python 3.7.7 - I had installed pairtools the way you described.

agalitsyna commented 2 years ago

Hi @aosakwe , @georgette-femerling, I was unable to reproduce your problem in a fresh environment with python 3.7.7 (also 3.8.10). It runs fine if appropriately installed.

Two modifications/checks that I can propose:

  1. Run installation with no caching by pip:

    pip install pysam numpy cython --no-cache-dir
    pip install pairtools --no-cache-dir
  2. Make sure that python and pip have the same location of their binaries. If pip and python have different locations make sure to install pip in the same path as python first.

    which pip
    which python
    which pairtools

    These three should sit in the same directory, e.g. /home/username/anaconda3/envs/test/bin/

Explanation: What you observe is pip installation problem when it cannot find cython and pysam libraries and link them appropriately with pairtools compiled code. Installing them in advance before pairtools in the same environment is crucial for pairtools working.

However, if pip and python have different locations, they might also have different paths for searching for libraries like cython and pysam. Another problem might be that you have tried to install pairtools before, so pip cached inappropriately linked version and you now re-install it without re-linking to new cython and pysam.

aosakwe commented 2 years ago

Hi, I tried reinstalling them in a new environment (using the same code you have provided) and all three are in the same directory. I still encounter the same issue. I did try to install pairtools before, how could I make resolve this issue?

agalitsyna commented 2 years ago

You may explicitly remove cache of pip, e.g. pip cache remove pairtools*

Another option might be installing from source:

git clone https://github.com/open2c/pairtools.git
cd pairtools 
pip install -e .
aosakwe commented 2 years ago

There seems to be no residual cache. I get the following installation error when installing from source:

Installing collected packages: pairtools Running setup.py develop for pairtools error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [28 lines of output]
    running develop
    running egg_info
    writing pairtools.egg-info/PKG-INFO
    writing dependency_links to pairtools.egg-info/dependency_links.txt
    writing entry points to pairtools.egg-info/entry_points.txt
    writing requirements to pairtools.egg-info/requires.txt
    writing top-level names to pairtools.egg-info/top_level.txt
    reading manifest file 'pairtools.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    no previously-included directories found matching 'doc/_build'
    no previously-included directories found matching 'doc/_templates'
    warning: no files found matching '*.pxd' anywhere in distribution
    warning: no previously-included files matching '__pycache__/*' found anywhere in distribution
    warning: no previously-included files matching '*.so' found anywhere in distribution
    warning: no previously-included files matching '*.pyd' found anywhere in distribution
    warning: no previously-included files matching '*.pyc' found anywhere in distribution
    warning: no previously-included files matching '.git*' found anywhere in distribution
    warning: no previously-included files matching '.deps/*' found anywhere in distribution
    warning: no previously-included files matching '.DS_Store' found anywhere in distribution
    writing manifest file 'pairtools.egg-info/SOURCES.txt'
    running build_ext
    building 'pairtools.lib.parse_pysam' extension
    gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC -fPIC -I/home/aosakwe/python_env/lib/python3.8/site-packages/pysam -I/home/aosakwe/python_env/lib/python3.8/site-packages/pysam/include/samtools -I/home/aosakwe/python_env/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.8.10/include/python3.8 -I/home/aosakwe/python_env/lib/python3.8/site-packages/numpy/core/include -I/home/aosakwe/python_env/lib/python3.8/site-packages/numpy/core/include -c /home/aosakwe/git_installs/pairtools/pairtools/lib/parse_pysam.c -o build/temp.linux-x86_64-3.8/home/aosakwe/git_installs/pairtools/pairtools/lib/parse_pysam.o
    /home/aosakwe/git_installs/pairtools/pairtools/lib/parse_pysam.c:779:10: fatal error: htslib/kstring.h: No such file or directory
      779 | #include "htslib/kstring.h"
          |          ^~~~~~~~~~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
agalitsyna commented 2 years ago

Hi @aosakwe , this is some pysam-related problem, not pairtools. As you can see in your output, there is no htslib file, which is part of pysam. I don't know why it might happen on your computer, and looking through the pysam issues on github might be the best idea. Here is an answer that might guide you: https://github.com/pysam-developers/pysam/issues/262

My other suggestions are:

  1. Make sure you work on Linux system. pysam might not be compiled/linked appropriately on windows and MacOS.
  2. Check your pysam version. In our tests pysam is 0.19.1
eharr commented 2 years ago

Hi @aosakwe - thanks so much for the updates tool, I'm eager to try it out but am running into the same issue.

One possible solution is to use pep518 to specify the build dependencies in the pyproject.toml file. That way pip would know to install numpy, cython and pysam before trying to build pairtools. There's some details on how it would work for a Cython package here: https://levelup.gitconnected.com/how-to-deploy-a-cython-package-to-pypi-8217a6581f09.

Failing that, would it be possible to update the version on bioconda? That would allow users to avoid the build-time issues.

agalitsyna commented 2 years ago

hi @eharr, good suggestion. I don't think it works without some package managers like poetry, though.

The problem is that pyproject.toml and pep518 require building the package in isolated environment, and cannot include the libraries linked from outside like pysam.

Here is the record of why pyproject.toml does not work with pysam dependency: https://github.com/open2c/pairtools/commits/remove-query-scaling If you made it work, feel free to submit the PR!

eharr commented 2 years ago

Thanks @agalitsyna I'll take a look at the pyproject.toml if I get a chance.

I think I've narrowed down the issue to some difference between conda and virtualenv. I noticed from the thread above that you're using a conda environment as a base for your pip commands. This also worked for me but using standard virtualenvs didn't. It seems as if the .so files aren't being copied into virtualenv directory. I did manage to get it working in virtual environment with an editable install for a git checkout (all of the .so files were present), so it seems like somehow the .so files aren't being placed correclty after the build when installed through a virtualenv. Any ideas on how this might happen?

eharr commented 2 years ago

I spent a bit more time looking at this today (python packaging is a mess!) and I'm not sure there's an easy way to make this pip-installable. It might be that something like cibuildwheel would fix the linking issue but I can't be sure. Another option might be to vendor in the pysam package. In either case it's probably a lot of work and might not be worth the additional complexity. That said, I think it makes it more important to update the version available through bioconda so that folks can start to use it in their pipelines.

agalitsyna commented 2 years ago

Hi @eharr, thanks for looking into this. I think pip version is mostly functional except for pysam installation problems for some users as reported by @aosakwe

We've pinged bioconda developers to review and merge the most recent pairtools. Recent pairtools should be added to bioconda soon.

eharr commented 2 years ago

Awesome - thank you @agalitsyna!

mblanche commented 2 years ago

Hi, I'm also finding the same issue when installing pairtools from scratch in a plain Ubuntu:22:04 docker (docker file attached bellow).

Is there a solution in sight? This is somewhat critical for our analytical pipelines

Dockerfile:

FROM ubuntu:22.04

USER root

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -y\
    && DEBIAN_FRONTEND=noninteractive \
    && apt-get install -y \
    python3 \
    python3-pip \
    git \
    samtools \
    tabix \
    libbz2-dev \
    lz4 \
    zlib1g-dev \
    build-essential \
    autotools-dev \
    automake \
    && apt-get clean \
    && apt-get purge \
    && rm -rf /var/lib/apt/lists/* /tmp/*

RUN pip3 install \
        cython\
        numpy\
        nose\
        click\
        scipy\
        pandas\
        pysam\
        pyyaml\
        bioframe\
        pairtools

RUN git clone https://github.com/nh13/pbgzip.git \
    && cd pbgzip \
    && sh autogen.sh \
    && ./configure \
    && make -j \
    && make install \
    && rm -rf /pbgzip

RUN git clone https://github.com/4dn-dcic/pairix \
    && mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \
    && rm -rf /pairix
agalitsyna commented 2 years ago

Hi @mblanche , have you tried separating pip installation for cython/numpy/pysam and pairtools? I find no trouble with this version of your Dockerfile:

FROM ubuntu:22.04

USER root

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -y\
        && DEBIAN_FRONTEND=noninteractive \
        && apt-get install -y \
        python3 \
        python3-pip \
        git \
        samtools \
        tabix \
        libbz2-dev \
        lz4 \
        zlib1g-dev \
        build-essential \
        autotools-dev \
        automake \
        && apt-get clean \
        && apt-get purge \
        && rm -rf /var/lib/apt/lists/* /tmp/*

RUN pip3 install \
                cython\
                numpy\
                nose\
                click\
                scipy\
                pandas\
                pysam\
                pyyaml\
                bioframe

RUN pip3 install pairtools

RUN git clone https://github.com/nh13/pbgzip.git \
        && cd pbgzip \
        && sh autogen.sh \
        && ./configure \
        && make -j \
        && make install \
        && rm -rf /pbgzip

RUN git clone https://github.com/4dn-dcic/pairix \
        && mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \
        && rm -rf /pairix
mblanche commented 2 years ago

Yeah, that’s what I just did (I putt an && and did two pip install) and it seems to be solving the problem. I’ll wait for the job to complete and post the docker solution back later.

Sent from my iPad

-- Marco

On Oct 5, 2022, at 10:51 AM, agalitsyna @.***> wrote:

 Hi @mblanche , have you tried separating pip installation for cython/numpy/pysam and pairtools? I find no trouble with this version of your Dockerfile:

FROM ubuntu:22.04

USER root

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -y\ && DEBIAN_FRONTEND=noninteractive \ && apt-get install -y \ python3 \ python3-pip \ git \ samtools \ tabix \ libbz2-dev \ lz4 \ zlib1g-dev \ build-essential \ autotools-dev \ automake \ && apt-get clean \ && apt-get purge \ && rm -rf /var/lib/apt/lists/ /tmp/

RUN pip3 install \ cython\ numpy\ nose\ click\ scipy\ pandas\ pysam\ pyyaml\ bioframe

RUN pip3 install pairtools

RUN git clone https://github.com/nh13/pbgzip.git \ && cd pbgzip \ && sh autogen.sh \ && ./configure \ && make -j \ && make install \ && rm -rf /pbgzip

RUN git clone https://github.com/4dn-dcic/pairix \ && mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \ && rm -rf /pairix

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

mblanche commented 2 years ago

Very similar to @agalitsyna suggestion. I'm forcing the installation of the prerequisite python libraries first and I finish with installing pairtools if the first install is successful with &&. I'm doing it inline in the same RUN command to avoid having an independent layer in the image. Here's my Dockerfile:

FROM ubuntu:22.04

USER root

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -y\
    && DEBIAN_FRONTEND=noninteractive \
    && apt-get install -y \
    python3 \
    python3-pip \
    git \
    samtools \
    tabix \
    libbz2-dev \
    lz4 \
    zlib1g-dev \
    build-essential \
    autotools-dev \
    automake \
    && apt-get clean \
    && apt-get purge \
    && rm -rf /var/lib/apt/lists/* /tmp/*

RUN pip3 install \
    cython\
    numpy\
    nose\
    click\
    scipy\
    pandas\
    pysam\
    pyyaml\
    bioframe \
    && pip3 install \
    pairtools

RUN git clone https://github.com/nh13/pbgzip.git \
    && cd pbgzip \
    && sh autogen.sh \
    && ./configure \
    && make -j \
    && make install \
    && rm -rf /pbgzip

RUN git clone https://github.com/4dn-dcic/pairix \
    && mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \
    && rm -rf /pairix
aosakwe commented 2 years ago

Hi, Georgette and I were able to fix the issue. It seemed to be an issue with the HPC cluster being used.