yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
46 stars 5 forks source link

hifi reads origin position #11

Open zhuyixin43 opened 8 months ago

zhuyixin43 commented 8 months ago

Hi there!

I had another question about the output reads. I stimulated hifi reads from a genome using pbsim3 and ccs. Is it possible for me to know where in the genome did the read came from? I know ccs generates consensus reads, so if that is not possible, then is it possible for me to know where did the subreads generated by pbsim3 came from? I am interested in not only which chromosome, but also where is the starting/end position.

I tried to look at the sam file and the maf file, they didn't seem to provide an answer to my question. It'll be super helpful if you could provide some helpful insights on this! Thank you so much!

yukiteruono commented 8 months ago

The MAF file output by PBSIM records the positions where the subreads were sequenced. By aligning the HiFi reads generated by ccs to the reference genome, you can locate the HiFi reads on the genome. Most HiFi reads have a sequencing error rate of less than 1%, so you can almost accurately locate HiFi reads by looking at the alignment results. We recommend LAST (https://gitlab.com/mcfrith/last) and minimap2 (https://github.com/lh3/minimap2) as aligners.

zhuyixin43 commented 8 months ago

Oh yes I actually meant the position where the sub-reads were sequenced. Thanks for the alignment suggestion, but I actually want to avoid using aligners.

Is it possible to know which of the sub-reads constructed a ccs read?

Thank you!

yukiteruono commented 8 months ago

As shown on https://ccs.how/how-does-ccs-work.html, ccs generates HiFi (ccs) reads as subread consensus. I don't know how to get what you want out of the ccs output file.