khuang28jhu / bs3

BS-Seeker3: An Ultra-fast, Versatile Pipeline for Mapping Bisulfite-treated Reads.
26 stars 13 forks source link

Unable to align .fq file // Minimum RAM necessary to align reads to human genome #38

Open eb-97 opened 1 year ago

eb-97 commented 1 year ago

I'm attempting to align a .fq file, containing simulated reads with bisulfite conversion. The file was produced using the tools wgsim [1], for creating the actual reads, and fastx-mutation-tools [2], for adding bisulfite conversion. The reference genome is the human genome (v37) [3]. The reads are 100 base-pairs long. The file to be aligned contains 1000002 reads. The index built by bs3 is 130GB big. The result logs of the attempted alignment are as follows:

Welcome to SNAP version 2.0.3.

Loading index from directory... 17s. 6,203,947,478 bases, seed size 20. Aligning. sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument sched_setaffinity: Invalid argument Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Reads/s Time in Aligner (s) 1,000,002 930,826 (93.08%) 62,075 (6.21%) 7,101 (0.71%) 0 (0.00%) 83,570 12
corrupted size vs. prev_size

The latter "Final Alignment Report" reports only 0 unique-hits reads, 0 reads mapped after post-filtering, etc.

The server I'm running bs3 on has 125GB of effective RAM. Therefore, I assume that this is the issue why bs3 doesn't work. Interestingly enough however, it works with a small file of reads, like 2000 reads in total. So even though, the available RAM on my server is smaller than the index size, the alignment process works. Therefore, I would like to ask how much RAM is needed for bs3 to work with a big file of reads like the one mentioned in beginning of this post?

[1] https://github.com/lh3/wgsim [2] https://github.com/nicolaprezza/fastx-mutate-tools [3] https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz