Closed kubu4 closed 9 months ago
Is this per exon or mean across all exons per gene ?
I think we want to keep exon with low / no expression. .. Lets set a threshold as sum of read counts for first 6 exons (as this is what we are looking at) to be 1000.
Okay, when I do this (sum read coverage across first 6 exons per gene), I end up with only 2,497 genes having a sum of >= 1000.
< 1000 >= 1000
35763 2501
what would the gene count be if reduced sum to 100.
< 100 >= 100
6510 31754
Greater than 500?
< 500 >= 500
31627 6637
lets go forward with > 100
Alrighty, we may need to make further adjustments. Those numbers above were just from a single sample that I was using for code testing.
I've managed to write code to look at all the files and do the threshold filtering for all samples on a per gene basis. I.e. All samples must have an exon coverage sum threshold of n
.
Threshold | Genes |
---|---|
10 | 23101 |
25 | 18119 |
50 | 13827 |
75 | 10357 |
100 | 7485 |
500 | 385 |
lets go with threshold of 10
20x