xi11west / bsmap

GNU General Public License v2.0
1 stars 2 forks source link

Bsmap Alignment very slow #1

Open haloudashu opened 3 months ago

haloudashu commented 3 months ago

Hi Team,

I am running Bsmap with the below command. I can see it's really slow.

bsmap \ -a ${outdir}/${sample}/${sample}_cut1.fq.gz \ -b ${outdir}/${sample}/${sample}_cut2.fq.gz \ -d ${genomefa} \ -z 33 -S 1 -s 16 -g 3 -n 1 -q 0 -f 5 -p 30 -u -r 1 -v 15 \ -o ${outdir}/${sample}/bsmap/${sample}_bsmapAligned.bam \ 2> ${outdir}/${sample}/bsmap/${sample}_bsmap_output.txt

I have 511,750,752 total reads paired. about 112 (Gb) data. I did pre processing. Filtered reads are used for align. it used 32 CPUs and 128GB RAM. The process has been running for 137 hours and is still not complete. Can you please help with this?

Version: Bsmap 2.90

haloudashu commented 3 months ago

image This is computing resource usage

xi11west commented 3 months ago

The gap mode -g is a much slower than ungapped mode. Also -p 30 is unlikely to be much better than -p 12.

On Mon, Jul 1, 2024 at 9:53 PM haloudashu @.***> wrote:

image.png (view on web) https://github.com/xi11west/bsmap/assets/174400440/e65b544a-75dc-49b1-8a20-34ab71848a02 This is computing resource usage

— Reply to this email directly, view it on GitHub https://github.com/xi11west/bsmap/issues/1#issuecomment-2201758204, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVMBQUHLYYISLOLGVYRBVTZKII23AVCNFSM6AAAAABKGT6G2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBRG42TQMRQGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

haloudashu commented 3 months ago

Thank you for your response. I tried setting the -g parameter to its default value and conducted tests using 14,596,415 total read pairs, which is approximately 5GB of data. When -g was set to 3, the completion time was 6 hours and 28 minutes, whereas when -g was set to 0 (the default), the completion time was only 1 hour and 2 minutes. Although this resulted in a roughly 5% reduction in alignment rate, it saved a significant amount of time. I think, under the default parameters, whether using -p 30 would result in a 50% time reduction compared to using -p 12?