mritchielab / restrander

MIT License
3 stars 1 forks source link

restrander stats file #8

Open sparthib opened 3 months ago

sparthib commented 3 months ago

I have a stats file that looks as below:

{ "stats": { "artefactStats": { "RTP-RTP": 36655, "TSO-TSO": 413310, "no artefact": 1316939 }, "strandStats": { "+": 707209, "-": 517923, "?": 541772 }, "totalReads": 1766904 } }

Here, the number of strands with "?" is greater than the number of reads having artefacts, I just wonder what are some other issues that could cause not being able to identify the strand orientation. Thanks!

jakob-schuster commented 2 months ago

Hi Sowmya,

Apologies for my slow response! Any reads that didn't match the forward structure (TSO near the start, polyA and -RTP near the end) or the reverse structure (RTP near the start, polyT and -TSO near the end) are counted as "?". This includes artefacts, so of your 541772 "?" reads, only 91807 aren't explained by being TSO-TSO or RTP-RTP artefacts.

This proportion (~5% of reads) is similar to levels we've seen in our data, and doesn't seem alarming. If a lot of reads (over 10%) were counted as "?" but couldn't be explained by being TSO-TSO/RTP-RTP artefacts, I'd suspect that the primers Restrander is using don't match the ones in your data - but that's not the case here.

What exactly makes up those 5% of reads is difficult to say. If they're just noisy, requiring more edit distance to be identified, you could run Restrander again with a higher error rate (0.30?) to catch some more reads. Otherwise, you'll have to dumpster dive through the "unknowns" file to find out what's wrong with them.