Closed georgette-femerling closed 2 years ago
Hi @georgette-femerling , how did you install pairtools? Have you tried
pip install numpy pysam cython
pip install pairtools
one after another?
Hi @agalitsyna , I am having the same issue and error message using python 3.7.7 - I had installed pairtools the way you described.
Hi @aosakwe , @georgette-femerling, I was unable to reproduce your problem in a fresh environment with python 3.7.7 (also 3.8.10). It runs fine if appropriately installed.
Two modifications/checks that I can propose:
Run installation with no caching by pip:
pip install pysam numpy cython --no-cache-dir
pip install pairtools --no-cache-dir
Make sure that python and pip have the same location of their binaries. If pip and python have different locations make sure to install pip in the same path as python first.
which pip
which python
which pairtools
These three should sit in the same directory, e.g. /home/username/anaconda3/envs/test/bin/
Explanation:
What you observe is pip installation problem when it cannot find cython
and pysam
libraries and link them appropriately with pairtools
compiled code. Installing them in advance before pairtools
in the same environment is crucial for pairtools
working.
However, if pip and python have different locations, they might also have different paths for searching for libraries like cython and pysam. Another problem might be that you have tried to install pairtools before, so pip
cached inappropriately linked version and you now re-install it without re-linking to new cython
and pysam
.
Hi, I tried reinstalling them in a new environment (using the same code you have provided) and all three are in the same directory. I still encounter the same issue. I did try to install pairtools before, how could I make resolve this issue?
You may explicitly remove cache of pip, e.g. pip cache remove pairtools*
Another option might be installing from source:
git clone https://github.com/open2c/pairtools.git
cd pairtools
pip install -e .
There seems to be no residual cache. I get the following installation error when installing from source:
Installing collected packages: pairtools Running setup.py develop for pairtools error: subprocess-exited-with-error
× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [28 lines of output]
running develop
running egg_info
writing pairtools.egg-info/PKG-INFO
writing dependency_links to pairtools.egg-info/dependency_links.txt
writing entry points to pairtools.egg-info/entry_points.txt
writing requirements to pairtools.egg-info/requires.txt
writing top-level names to pairtools.egg-info/top_level.txt
reading manifest file 'pairtools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'doc/_build'
no previously-included directories found matching 'doc/_templates'
warning: no files found matching '*.pxd' anywhere in distribution
warning: no previously-included files matching '__pycache__/*' found anywhere in distribution
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.git*' found anywhere in distribution
warning: no previously-included files matching '.deps/*' found anywhere in distribution
warning: no previously-included files matching '.DS_Store' found anywhere in distribution
writing manifest file 'pairtools.egg-info/SOURCES.txt'
running build_ext
building 'pairtools.lib.parse_pysam' extension
gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC -O2 -ftree-vectorize -march=core-avx2 -fno-math-errno -fPIC -fPIC -I/home/aosakwe/python_env/lib/python3.8/site-packages/pysam -I/home/aosakwe/python_env/lib/python3.8/site-packages/pysam/include/samtools -I/home/aosakwe/python_env/include -I/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.8.10/include/python3.8 -I/home/aosakwe/python_env/lib/python3.8/site-packages/numpy/core/include -I/home/aosakwe/python_env/lib/python3.8/site-packages/numpy/core/include -c /home/aosakwe/git_installs/pairtools/pairtools/lib/parse_pysam.c -o build/temp.linux-x86_64-3.8/home/aosakwe/git_installs/pairtools/pairtools/lib/parse_pysam.o
/home/aosakwe/git_installs/pairtools/pairtools/lib/parse_pysam.c:779:10: fatal error: htslib/kstring.h: No such file or directory
779 | #include "htslib/kstring.h"
| ^~~~~~~~~~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
Hi @aosakwe , this is some pysam-related problem, not pairtools. As you can see in your output, there is no htslib file, which is part of pysam. I don't know why it might happen on your computer, and looking through the pysam issues on github might be the best idea. Here is an answer that might guide you: https://github.com/pysam-developers/pysam/issues/262
My other suggestions are:
Hi @aosakwe - thanks so much for the updates tool, I'm eager to try it out but am running into the same issue.
One possible solution is to use pep518 to specify the build dependencies in the pyproject.toml file. That way pip would know to install numpy, cython and pysam before trying to build pairtools. There's some details on how it would work for a Cython package here: https://levelup.gitconnected.com/how-to-deploy-a-cython-package-to-pypi-8217a6581f09.
Failing that, would it be possible to update the version on bioconda? That would allow users to avoid the build-time issues.
hi @eharr, good suggestion. I don't think it works without some package managers like poetry
, though.
The problem is that pyproject.toml and pep518 require building the package in isolated environment, and cannot include the libraries linked from outside like pysam
.
Here is the record of why pyproject.toml does not work with pysam dependency: https://github.com/open2c/pairtools/commits/remove-query-scaling If you made it work, feel free to submit the PR!
Thanks @agalitsyna I'll take a look at the pyproject.toml if I get a chance.
I think I've narrowed down the issue to some difference between conda and virtualenv. I noticed from the thread above that you're using a conda environment as a base for your pip
commands. This also worked for me but using standard virtualenvs didn't. It seems as if the .so
files aren't being copied into virtualenv directory. I did manage to get it working in virtual environment with an editable install for a git checkout (all of the .so files were present), so it seems like somehow the .so files aren't being placed correclty after the build when installed through a virtualenv. Any ideas on how this might happen?
I spent a bit more time looking at this today (python packaging is a mess!) and I'm not sure there's an easy way to make this pip-installable. It might be that something like cibuildwheel would fix the linking issue but I can't be sure. Another option might be to vendor in the pysam package. In either case it's probably a lot of work and might not be worth the additional complexity. That said, I think it makes it more important to update the version available through bioconda so that folks can start to use it in their pipelines.
Hi @eharr, thanks for looking into this.
I think pip
version is mostly functional except for pysam installation problems for some users as reported by @aosakwe
We've pinged bioconda developers to review and merge the most recent pairtools. Recent pairtools should be added to bioconda soon.
Awesome - thank you @agalitsyna!
Hi, I'm also finding the same issue when installing pairtools from scratch in a plain Ubuntu:22:04 docker (docker file attached bellow).
Is there a solution in sight? This is somewhat critical for our analytical pipelines
FROM ubuntu:22.04
USER root
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y\
&& DEBIAN_FRONTEND=noninteractive \
&& apt-get install -y \
python3 \
python3-pip \
git \
samtools \
tabix \
libbz2-dev \
lz4 \
zlib1g-dev \
build-essential \
autotools-dev \
automake \
&& apt-get clean \
&& apt-get purge \
&& rm -rf /var/lib/apt/lists/* /tmp/*
RUN pip3 install \
cython\
numpy\
nose\
click\
scipy\
pandas\
pysam\
pyyaml\
bioframe\
pairtools
RUN git clone https://github.com/nh13/pbgzip.git \
&& cd pbgzip \
&& sh autogen.sh \
&& ./configure \
&& make -j \
&& make install \
&& rm -rf /pbgzip
RUN git clone https://github.com/4dn-dcic/pairix \
&& mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \
&& rm -rf /pairix
Hi @mblanche , have you tried separating pip installation for cython/numpy/pysam and pairtools? I find no trouble with this version of your Dockerfile:
FROM ubuntu:22.04
USER root
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y\
&& DEBIAN_FRONTEND=noninteractive \
&& apt-get install -y \
python3 \
python3-pip \
git \
samtools \
tabix \
libbz2-dev \
lz4 \
zlib1g-dev \
build-essential \
autotools-dev \
automake \
&& apt-get clean \
&& apt-get purge \
&& rm -rf /var/lib/apt/lists/* /tmp/*
RUN pip3 install \
cython\
numpy\
nose\
click\
scipy\
pandas\
pysam\
pyyaml\
bioframe
RUN pip3 install pairtools
RUN git clone https://github.com/nh13/pbgzip.git \
&& cd pbgzip \
&& sh autogen.sh \
&& ./configure \
&& make -j \
&& make install \
&& rm -rf /pbgzip
RUN git clone https://github.com/4dn-dcic/pairix \
&& mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \
&& rm -rf /pairix
Yeah, that’s what I just did (I putt an && and did two pip install) and it seems to be solving the problem. I’ll wait for the job to complete and post the docker solution back later.
Sent from my iPad
-- Marco
On Oct 5, 2022, at 10:51 AM, agalitsyna @.***> wrote:
Hi @mblanche , have you tried separating pip installation for cython/numpy/pysam and pairtools? I find no trouble with this version of your Dockerfile:
FROM ubuntu:22.04
USER root
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y\ && DEBIAN_FRONTEND=noninteractive \ && apt-get install -y \ python3 \ python3-pip \ git \ samtools \ tabix \ libbz2-dev \ lz4 \ zlib1g-dev \ build-essential \ autotools-dev \ automake \ && apt-get clean \ && apt-get purge \ && rm -rf /var/lib/apt/lists/ /tmp/
RUN pip3 install \ cython\ numpy\ nose\ click\ scipy\ pandas\ pysam\ pyyaml\ bioframe
RUN pip3 install pairtools
RUN git clone https://github.com/nh13/pbgzip.git \ && cd pbgzip \ && sh autogen.sh \ && ./configure \ && make -j \ && make install \ && rm -rf /pbgzip
RUN git clone https://github.com/4dn-dcic/pairix \ && mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \ && rm -rf /pairix
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Very similar to @agalitsyna suggestion. I'm forcing the installation of the prerequisite python libraries first and I finish with installing pairtools if the first install is successful with &&
. I'm doing it inline in the same RUN
command to avoid having an independent layer in the image. Here's my Dockerfile
:
FROM ubuntu:22.04
USER root
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update -y\
&& DEBIAN_FRONTEND=noninteractive \
&& apt-get install -y \
python3 \
python3-pip \
git \
samtools \
tabix \
libbz2-dev \
lz4 \
zlib1g-dev \
build-essential \
autotools-dev \
automake \
&& apt-get clean \
&& apt-get purge \
&& rm -rf /var/lib/apt/lists/* /tmp/*
RUN pip3 install \
cython\
numpy\
nose\
click\
scipy\
pandas\
pysam\
pyyaml\
bioframe \
&& pip3 install \
pairtools
RUN git clone https://github.com/nh13/pbgzip.git \
&& cd pbgzip \
&& sh autogen.sh \
&& ./configure \
&& make -j \
&& make install \
&& rm -rf /pbgzip
RUN git clone https://github.com/4dn-dcic/pairix \
&& mv /pairix/util/bam2pairs/bam2pairs /usr/bin/ \
&& rm -rf /pairix
Hi, Georgette and I were able to fix the issue. It seemed to be an issue with the HPC cluster being used.
I am getting an import error when I try to run pairtools in a cluster:
I'm using Python 3.8.10.