swarris / Pacasus

Correction of palindromes in long reads from PacBio and Nanopore
MIT License
14 stars 3 forks source link

Number of Pacasus iterations per read in the cleaned data set? #15

Closed Mailinnia closed 4 years ago

Mailinnia commented 4 years ago

Hi, I'm pretty new to bioinformatics. I'm using Pacasus to detect palindromes/chimeras in my WGA data set. I would like to know how many pacasus iterations there are per read in my data set. How do I pull out this information when running Pacasus?

Thank you for your assistance :)

swarris commented 4 years ago

Hi Maillinia,

That information is in the resulting read name. For each iteration a '_a1'/'_a2' or '_b1'/'_b2' has been added to the name at the end. The '1' stands for the left part of the original read and '2' for the right part. 'a' is used when the read is pretty much split in half and 'b' in all other cases. So if I start with a read named 'myPacBio' the resulting fasta file might contain:

Originally: myPacBio = |myPacBio_b1||myPacBio_b2_b1|myPacBio_b2_b2||

Hence the number of _[ab][12] elements in the read name tells you how often it has been split.