yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

samtools flagstat 0% mapped with SAM file produced by multipass #2

Open rudigaspardo opened 1 year ago

rudigaspardo commented 1 year ago

Hi, I'm using pbsim2 for my Thesis to test performance of aligning long reads tools, and I was searching a method to obtain sam file instead maf.(In the same way simlord do for instance). I've seen the new pbsim3 in the multipass method and I've used this steps:

step 1 - pbsim sam creation: ./pbsim --strategy wgs --method qshmm --qshmm ../data/QSHMM-RSII.model --depth 20 --genome ../../test/E_coli_ASM584v2.fna --pass-num 10

step2 - sam2bam and flagstat in samtools to generate sorted, indexed bam and statistics: samtools view -bS -T ../../test/E_coli_ASM584v2.fna sd_0001.sam -O BAM -o sd_0001.bam -@ 2 samtools sort ./sd_0001.bam -o ./sdords_0001.bam -@ 3 samtools index ./sdords_0001.bam -@ 3 samtools flagstat -O tsv ./sdords_0001.bam

I also check sam with picard ValidateSamFile and seems ok

java -jar ~/Software/picard/build/libs/picard.jar ValidateSamFile I=./sd_0001.sam MODE=SUMMARY R=../../test/E_coli_ASM584v2.fna

but statistics(samtools flagstat) show 0% Mapped (see below), I would have expected 100%, (like when I try with simlord sam file for example) What I'm missing or doing wrong? Thanks in advance for help

100710 0 total (QC-passed reads + QC-failed reads) 100710 0 primary 0 0 secondary 0 0 supplementary 0 0 duplicates 0 0 primary duplicates 0 0 mapped 0.00% N/A mapped % 0 0 primary mapped 0.00% N/A primary mapped % 0 0 paired in sequencing 0 0 read1 0 0 read2 0 0 properly paired N/A N/A properly paired % 0 0 with itself and mate mapped 0 0 singletons N/A N/A singletons % 0 0 with mate mapped to a different chr 0 0 with mate mapped to a different chr (mapQ>=5)

yukiteruono commented 1 year ago

Thank you for your using PBSIM. The maf file output by PBSIM is an alignment file between reads and their reference genome. However, the sam file that PBSIM3 simulates in multi-pass mode is a set of subreads generated from SMRTbell, not an alignment file. The subread sam file is generated according to pages 11 and 12 of the SMRT Tools Reference Guide (https://www.pacb.com/wp-content/uploads/SMRT-Tools-Reference-Guide-v8.0.pdf).

rudigaspardo commented 1 year ago

Thank you very much, now is clear to me. In your experience is there a way/tool to convert MAF in SAM ?

yukiteruono commented 1 year ago

I use a house-made tool to convert MAF to SAM, but don't publish it.