yupenghe / methylpy

WGBS/NOMe-seq Data Processing & Differential Methylation Analysis
Apache License 2.0
135 stars 48 forks source link

multi-threading indexing? #62

Closed yvnkm closed 3 years ago

yvnkm commented 4 years ago

I'm running methylpy single-end-pipeline currently and it has been working very well!

Except that indexing is taking a very long time, and it only uses one core even though I set --num-procs 32. Is there a way I can do multithreading for indexing?

Command line used

methylpy single-end-pipeline --read-files $i --sample ${i%%.fq.gz}.non-pbat --forward-ref hg38_methylpy_f --reverse-ref hg38_methylpy_r --ref-fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa --num-procs 32 --trim-reads False --remove-chr-prefix False &>${i%%.fq.gz}.non-pbat.out

Output file with time stamps

Begin splitting reads for 21_R2.non-pbat_libA Fri Sep 25 08:49:01 2020

No trimming on reads Fri Sep 25 08:54:00 2020

Begin converting reads for 21_R2.non-pbat_libA Fri Sep 25 08:54:00 2020

Begin Running Bowtie2 for libA Fri Sep 25 08:54:18 2020

32115469 reads; of these: 32115469 (100.00%) were unpaired; of these: 14266954 (44.42%) aligned 0 times 11212951 (34.91%) aligned exactly 1 time 6635564 (20.66%) aligned >1 times 55.58% overall alignment rate Processing forward strand hits Fri Sep 25 09:31:11 2020

32115469 reads; of these: 32115469 (100.00%) were unpaired; of these: 14350834 (44.69%) aligned 0 times 11210926 (34.91%) aligned exactly 1 time 6553709 (20.41%) aligned >1 times 55.31% overall alignment rate Processing reverse strand hits Fri Sep 25 10:12:29 2020

Finding multimappers Fri Sep 25 10:14:23 2020

[bam_sort_core] merging from 0 files and 32 in-memory blocks... There are 32115469 total input reads Fri Sep 25 10:23:09 2020

There are 20282539 uniquely mapping reads, 63.1550453148 percent remaining Fri Sep 25 10:23:09 2020

Begin calling mCs Fri Sep 25 10:23:09 2020

Input not indexed. Indexing... Fri Sep 25 10:23:09 2020

[mpileup] 1 samples in 1 input files Done Fri Sep 25 11:50:51 2020

yupenghe commented 4 years ago

Hey, yes the latest version of samtools allows multi-threading indexing. I just added this new feature to methylpy. I would expect that indexing becomes much faster in the latest version of methylpy.

yvnkm commented 4 years ago

Thanks for the reply! What would be the latest version # of methylpy? Mine is 1.4.3 (installed from Anaconda).

output from methylpy

$ methylpy usage: methylpy [-h] ...

You are using methylpy 1.4.3 version (/python3.7/site-packages/methylpy/)

optional arguments: -h, --help show this help message and exit

functions:

build-reference     Building reference for bisulfite sequencing data
single-end-pipeline
                    Methylation pipeline for single-end data
paired-end-pipeline
                    Methylation pipeline for paired-end data
DMRfind             Identify differentially methylated regions
reidentify-DMR      Re-call DMRs from existing DMRfind result
add-methylation-level
                    Get methylation level of genomic regions
bam-quality-filter  Filter out single-end reads by mapping quality and mCH
                    level
call-methylation-state
                    Call cytosine methylation state from BAM file
allc-to-bigwig      Get bigwig file from allc file
merge-allc          Merge allc files
index-allc          Index allc files
filter-allc         Filter allc file
test-allc           Binomial test on allc file
yupenghe commented 4 years ago

The latest is 1.4.6. Methylpy can be updated through conda and pip.

yupenghe commented 4 years ago

You can upgrade the package using pip. Conda is usually late (it still uses the version released 4months ago).

pip install --upgrade methylpy