wwylab / MuSE

Somatic point mutation caller
GNU General Public License v2.0
18 stars 6 forks source link

Parallel processing not working on MuSE sump #14

Closed ashleyacevedo closed 7 months ago

ashleyacevedo commented 10 months ago

I'm running MuSE sump as follows: MuSE sump -G -I calls.MuSE.txt -O muse.vcf -n 72 -D known_sites.vcf.gz

The runtime is >15 hours for a MuSE.txt file of ~6Gb and there appears to be no multi-processing. See example logging of machine usage:

2023-08-29 12:07:01 muse_sump INFO CPU: 1% (72 cores) * Memory: 33782/140744MB * Storage: 25/1680GB * Net: 0↓/0↑MBps
2023-08-29 12:17:01 muse_sump INFO CPU: 1% (72 cores) * Memory: 34017/140744MB * Storage: 25/1680GB * Net: 0↓/0↑MBps

I'm running MuSE-version 2.0.2 on a machine with 140Gb RAM, 72 cpu on the cloud. No stdout or stderr is produced.

Do you have any thoughts on what I might be doing wrong?

jiyunmaths commented 10 months ago

Hi @ashleyacevedo. Thanks for using MuSE. MuSE sump uses OpenMP for parallel computing. Please can you provide what the OS the cloud uses, and the gcc version you used for compiling MuSE 2.02? I am happy to help.

ashleyacevedo commented 10 months ago

Thanks for the quick reply @jiyunmaths! Here is my Dockerfile

FROM ubuntu:20.04

ENV SAMTOOLS_VERSION=1.14

# Set working directory
WORKDIR /

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    bzip2 \
    gcc \
    g++ \
    libbz2-dev \
    libcurl3-dev \
    libncurses5-dev \
    liblzma-dev \
    make \
    cmake \
    autoconf \
    libtool \
    wget \
    zlib1g-dev \
    libssl-dev \
    git \
    ca-certificates cpp libltdl-dev unzip

# Pulling SAMTools from its repository, unpacking the archive and installing

ADD https://github.com/samtools/samtools/releases/download/${SAMTOOLS_VERSION}/samtools-${SAMTOOLS_VERSION}.tar.bz2 .
RUN tar xjvf samtools-${SAMTOOLS_VERSION}.tar.bz2 \
    && rm samtools-${SAMTOOLS_VERSION}.tar.bz2 \
    && cd samtools-${SAMTOOLS_VERSION} \
    && ./configure \
    && make \
    && make install \
    && cd ..

# Download MuSE-2.0 executable
ADD https://github.com/wwylab/MuSE/archive/refs/tags/v2.0.2.tar.gz .
RUN tar -xvf v2.0.2.tar.gz \
    && rm v2.0.2.tar.gz \
    && cd MuSE-2.0.2 \
    && ./install_muse.sh

# Move MuSE to PATH so that it can be run as a command
ENV PATH=/MuSE-2.0.2:$PATH

The gcc installed on this image is 9.4.0.

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.2' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-9QDOt0/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2) 
jiyunmaths commented 10 months ago

Hi @ashleyacevedo, I fixed a bug in the Makefile in V2.0.2 which disabled openmp for parallel computing in the MuSE sump step. The new release is V2.0.3. Please include this latest version in your Dockerfile. Thank you very much.

Ben-Habermeyer commented 10 months ago

thank you @jiyunmaths for this update. I am working with @ashleyacevedo - what is your release process for uploading new versions to conda? https://anaconda.org/jiyunmaths/muse or do you recommend installing from source.

jiyunmaths commented 10 months ago

@Ben-Habermeyer Sorry I do not have a plan to release a new conda version at the moment. Please install it using the source code. Thanks.

ashleyacevedo commented 10 months ago

Thank you so much @jiyunmaths 🙏

jiyunmaths commented 7 months ago

The issue is fixed.