Closed yvnkm closed 3 years ago
Hey, yes the latest version of samtools allows multi-threading indexing. I just added this new feature to methylpy. I would expect that indexing becomes much faster in the latest version of methylpy.
Thanks for the reply! What would be the latest version # of methylpy? Mine is 1.4.3 (installed from Anaconda).
$ methylpy usage: methylpy [-h] ...
You are using methylpy 1.4.3 version (/python3.7/site-packages/methylpy/)
optional arguments: -h, --help show this help message and exit
functions:
build-reference Building reference for bisulfite sequencing data
single-end-pipeline
Methylation pipeline for single-end data
paired-end-pipeline
Methylation pipeline for paired-end data
DMRfind Identify differentially methylated regions
reidentify-DMR Re-call DMRs from existing DMRfind result
add-methylation-level
Get methylation level of genomic regions
bam-quality-filter Filter out single-end reads by mapping quality and mCH
level
call-methylation-state
Call cytosine methylation state from BAM file
allc-to-bigwig Get bigwig file from allc file
merge-allc Merge allc files
index-allc Index allc files
filter-allc Filter allc file
test-allc Binomial test on allc file
The latest is 1.4.6. Methylpy can be updated through conda and pip.
You can upgrade the package using pip. Conda is usually late (it still uses the version released 4months ago).
pip install --upgrade methylpy
I'm running methylpy single-end-pipeline currently and it has been working very well!
Except that indexing is taking a very long time, and it only uses one core even though I set --num-procs 32. Is there a way I can do multithreading for indexing?
Command line used
methylpy single-end-pipeline --read-files $i --sample ${i%%.fq.gz}.non-pbat --forward-ref hg38_methylpy_f --reverse-ref hg38_methylpy_r --ref-fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa --num-procs 32 --trim-reads False --remove-chr-prefix False &>${i%%.fq.gz}.non-pbat.out
Output file with time stamps
Begin splitting reads for 21_R2.non-pbat_libA Fri Sep 25 08:49:01 2020
No trimming on reads Fri Sep 25 08:54:00 2020
Begin converting reads for 21_R2.non-pbat_libA Fri Sep 25 08:54:00 2020
Begin Running Bowtie2 for libA Fri Sep 25 08:54:18 2020
32115469 reads; of these: 32115469 (100.00%) were unpaired; of these: 14266954 (44.42%) aligned 0 times 11212951 (34.91%) aligned exactly 1 time 6635564 (20.66%) aligned >1 times 55.58% overall alignment rate Processing forward strand hits Fri Sep 25 09:31:11 2020
32115469 reads; of these: 32115469 (100.00%) were unpaired; of these: 14350834 (44.69%) aligned 0 times 11210926 (34.91%) aligned exactly 1 time 6553709 (20.41%) aligned >1 times 55.31% overall alignment rate Processing reverse strand hits Fri Sep 25 10:12:29 2020
Finding multimappers Fri Sep 25 10:14:23 2020
[bam_sort_core] merging from 0 files and 32 in-memory blocks... There are 32115469 total input reads Fri Sep 25 10:23:09 2020
There are 20282539 uniquely mapping reads, 63.1550453148 percent remaining Fri Sep 25 10:23:09 2020
Begin calling mCs Fri Sep 25 10:23:09 2020
Input not indexed. Indexing... Fri Sep 25 10:23:09 2020
[mpileup] 1 samples in 1 input files Done Fri Sep 25 11:50:51 2020