Closed Irrationone closed 7 years ago
oh, right, thanks for the reminder.
Is the collapse clones option in 8ad96085cc8acf51c10754a4a0f10af9686ac368 related to this?
Also, I suppose this may not be the case, but given that indels are no longer reversed in partis, could I just get counts by enumerating duplicate reads in the input FASTQ? Or does trimming at sequence ends/N-padding make this inaccurate?
uh, no -- --dont-collapse-clones
just refers to allele finding, where by default we collapse clones to get more independent mutations, and hence more accurate uncertainties.
This newer stuff is collapsing identical sequences purely for efficiency reasons. And indels are definitely still reversed internally. The change is that sequences that are identical after reversing indels are no longer treated as identical (because they're not biologically identical -- they're only identical in that the sequence that goes through the hmm is identical).
I think what I'm looking for is actually the duplicates
field in the partition output -- I didn't realize it was there.
hee hee, that's because I added it last week, and didn't tell anybody except the manual. Well, great, then.
Hi Duncan,
When you get the time, can you look into seeing if readcounts can be tracked for duplicate sequences as per our previous discussion? I don't mean to rush you on this -- but it is a significant issue for PCR-based TCR-seq experiments; the readcounts would allow for some error correction.
Thanks!