yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

About read length in multi-pass read simulation #9

Open wzboy1984 opened 8 months ago

wzboy1984 commented 8 months ago

Dear author,

I'm confused by the read length setting in multi-pass read simulation, which was described to be fixed to --length-mean value. What's the reason of disabling the --length-sd and setting the read length to --length-mean in this condition?

Note: for multi-pass sequencing in WGS simulation, the read length is set roughly equal to the --length-mean value, and -- length-sd is disabled.

Best wishes,

yukiteruono commented 8 months ago

Thank you for your using PBSIM. Unlike PacBio CLR and Nanopore reads, the length variance of PacBio HiFi reads is small. The HiFi read simulation can also be made to use --length-sd, but we believe that a constant HiFi read length is not a disadvantage for simulating HiFi reads. We are always looking to improve PBSIM's HiFi read simulation and welcome your comments and suggestions.

wzboy1984 commented 8 months ago

Thanks for your reply. I was simulating hifi reads for genome assembly. When the hifi reads had the same read lengths, there were much smaller number of contained reads found. This led to that the assembling program spent more running time. Usually, 3 out of 4 reads are contained reads. Too small contained read percentage is abnormal. I suggest that you can keep the --length-sd, not fix the hifi read length.

yukiteruono commented 8 months ago

I understand the problem you are having. We will improve PBSIM to use --length-sd in HiFi read simulation. However, there are some points we would like to consider regarding how to implement it, and the release is scheduled for next month.