Open sparthib opened 5 months ago
Hi Sowmya,
Apologies for my slow response! Any reads that didn't match the forward structure (TSO near the start, polyA and -RTP near the end) or the reverse structure (RTP near the start, polyT and -TSO near the end) are counted as "?". This includes artefacts, so of your 541772 "?" reads, only 91807 aren't explained by being TSO-TSO or RTP-RTP artefacts.
This proportion (~5% of reads) is similar to levels we've seen in our data, and doesn't seem alarming. If a lot of reads (over 10%) were counted as "?" but couldn't be explained by being TSO-TSO/RTP-RTP artefacts, I'd suspect that the primers Restrander is using don't match the ones in your data - but that's not the case here.
What exactly makes up those 5% of reads is difficult to say. If they're just noisy, requiring more edit distance to be identified, you could run Restrander again with a higher error rate (0.30?) to catch some more reads. Otherwise, you'll have to dumpster dive through the "unknowns" file to find out what's wrong with them.
Super late response on my end, but this could be due to reads that look like this
ACCTACTTGGTTCAGTTACGTATTGCTACTTGCCTGTCGCTCTATCTTCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTCATGGGTCACTGAGGCTTTTTATTTTGAGCACAAAACCACCGGGGATCTAGCCTGTGGCCACCCCGGTGG TTTTGTGCTCAAAATAAAAAGCCTCAGTGACCCATAAAAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAGAAGTGAGCGACAGGCGAGTGGATACGTAA
, with both polyA and polyT stretches in the same read indicating library prep artifacts.
I have a stats file that looks as below:
{ "stats": { "artefactStats": { "RTP-RTP": 36655, "TSO-TSO": 413310, "no artefact": 1316939 }, "strandStats": { "+": 707209, "-": 517923, "?": 541772 }, "totalReads": 1766904 } }
Here, the number of strands with "?" is greater than the number of reads having artefacts, I just wonder what are some other issues that could cause not being able to identify the strand orientation. Thanks!