samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
657 stars 240 forks source link

Running time in samtools mpileup #1011

Open beginner984 opened 5 years ago

beginner984 commented 5 years ago

Sorry,

I have some .bam files from WGS their average size is 70000000 KB. I tried Varscanbut I noticed even with 16 cpu calling mutations never finished finally after 36 hours session being killed on our cluster. I am wondering if there is any way to parallelize my job for samtools mpileup, multi-threading or splitting bam ?

This is my code

/home/local/software/GATK/3.7/source/varscan somatic <(samtools mpileup --no-BAQ -f /temp/hgig/fi1d18/hs37d5.fa /scratch/fi1d18/example_results/1631_WTSI-COLO_075_b/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-COLO_075_b.dupmarked.bamm /scratch/fi1d18/example_results/1631_WTSI-COLO_075_1pre/mapped_sample/HUMAN_1000Genomes_hs37d5_genomic_WTSI-COLO_075_1pre.dupmarked.bam) /wgs --mpileup 1 --output-vcf

On our cluster we have maximum 40000mb memory, 32 nodes each node with 16 CPU

I don't know how do samtools mpileup assigning all nodes

Any help please?

pd3 commented 5 years ago

Use the -r option to run on smaller genomic regions in parallel. The chunks don't need to overlap, the program pulls in the overlapping reads.