yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

How to get the sam output file? #29

Open tanger-code opened 1 month ago

tanger-code commented 1 month ago

Hi.

I'm simulating long reads from a genome. But the output is .maf file. How can I get the SAM output? I want to get HIFI reads so I need the sam file and put it into ccs software.

And if I want to do some simulation experiment such as calling SV based on the simulation reads, can I use the maf file as the truth set?

Any advice would be very helpful to me.

yukiteruono commented 1 month ago

Executing the command below will generate sam and maf files.

pbsim --strategy wgs
      --method qshmm
      --qshmm data/QSHMM-RSII.model
      --depth 20
      --genome sample/sample.fasta
      --pass-num 10

Please check your command and the output files after execution again.

PBSIM3 generates sam and maf for multi-pass sequencing data. Therefore, that maf can be used as a true set for multi-pass sequencing data. However, since HiFi reads are generated by ccs, that maf cannot be used as a true set for HiFi reads.

tanger-code commented 1 month ago

OK, thank you. And is there a command to simulate reads with no error and no variant? Maybe sometime I just want to get some reads from a fasta gnome.

yukiteruono commented 1 month ago

PBSIM3 cannot generate error-free reads.

tanger-code commented 1 month ago

When I use commandpbsim --strategy wgs --method qshmm --qshmm model/QSHMM-RSII.model --depth 5 --genome GCA_chr21.fa --length-mean 17000 length-sd 3000 --length-min 14000 --length-max 20000 --accuracy-mean 0.99 --accuracy-min 0.95 --accuracy-max 1.0 to simulate some reads, the output information is: image

What the insertion rate and deletion rate mean? It's mean the error rate or the newly added variation information?

yukiteruono commented 1 month ago

Substitution rate, insertion rate, and deletion rate are their respective percentages in simulated read sequencing. For example, if 3 insertions occur when sequencing a 1000 bp template, the insertion rate will be 0.003.