ndierckx / Sim-it

Versatile simulator for structural variance and Nanopore/PacBio sequencing reads
Apache License 2.0
21 stars 1 forks source link

Updating the pacbio CCS error profile #12

Open AyushSaxena opened 2 years ago

AyushSaxena commented 2 years ago

Hi,

This is a very promising tool, thank you for writing it! I have a question about error profiles. The pacbio CCS error profile didn't look much different from the RSII / sequel2 (especially for 1-bp deletion). And I was wondering if that needs to be updated? I'm unable to understand how CCS profile is similar to sequel2/RSII especially as the latter is CLR. In my experience of digging through recently generated CCS mapping on IGV, they're pretty clean.

I could of course generate my own profile, but I was wondering if you had one on you already.

Thank you Ayush

ndierckx commented 2 years ago

Hi,

I could add a sequel2 hifi actually. But the profiles of CCS_hifi is very different than from the other two.. I think you misinterpret the 1bp deletion, the 0.9 means 90% of the deletions are of 1 bp length The total amount of deletions is the second line, so that is a big difference...

The simulations should be also pretty clean no?

AyushSaxena commented 2 years ago

Hi Nicolas,

Thank you for writing back. You are correct, I was mis-interpreting the error profiles. I ran the perl script with the CCS reads and the simulated reads were fairly clean.

Could you please help me (and others) interpret the error profile file format? I briefly skimmed the git page to find it, but couldn't, but I could have looked harder. I'm assuming that the top three rows are error rates for insertion, deletions and SNPs, in that order? (I'm assuming that order because you said the amount of deletions is on the second line).

Also I still have a hard time interpreting this - DEL_LENGTH 1:0.711687859931202 2:0.675581473289094 3:0.608471074380165

I am unable to understand how these are probability estimates. If the 0.71 refers to a 71% chance that a deletion is a 1-bp deletion, the probabilities don't add up to 1, correct?

I also found a bug (at least I think it is a bug), and I will report that on a separate ticket, so this could be about error profiles alone.

Thank you Ayush