Open DarioS opened 4 years ago
Hi DarioS,
We have repeated the problem you presented here. We are implementing a signal to noise type approach and it should solve the problem of this kind. We will update asap. Thank you for bringing this up to us.
In current version of code, we have integrated other criteria in transcription factor motif prediction besides the statistical evidence. This includes the expression level of the predicted transcription factor and the absence of prediction in reference sequence, which will be updated to consider the signal to noise ratio between ref and alt sequences. We applied this combined approach to control the false discovery rate in the motif analysis.
Glad it could be reproduced and will be improved. Looking forward to the software update, but first I go on holidays for a week.
The two most famous non-coding variants are at chr5 1295113 and chr5 1295135 in hg38. It has been experimentally determined that the TERT promoter is bound by GABPA. I took the reference genome sequence including them and 20 bases on either side and found that the reference genome has a FIMO p-value below 0.001 for one of them:
I also used FIMO with the hotspot mutation changes incorporated into the sequence.
Both of the hotspot mutations have a FIMO p-value below 0.001. Therefore cis-X throws away one of the two TERT promoter hotspot mutations in every analysis. Has the software been tested to check if it is producing sensible results? I don't see unit tests.
The transcription factor binding analysis also has another statistical flaw.