Open jvanheld opened 4 months ago
Done
peak-motifs
in differential mode (top sequences versus background sequences)@brunocontrerasmoreira , for info
So far we tested 3 values for the number of top sequences to keep : 250, 500, 1000. The top-scoring motifs discovered are very similar, and their significance (k-mer over-representation binomial significance) increases from 250 to 500 and from 500 to 1000.
PBM data are provided with 2 normalisation methods: SD or QNZC (z-scores). The signal intensities and the ranking of the spots show important differences depending on the signal normalisation method. We tested motif discovery with both approaches.
For NACC2, the motif discovery results are quite different. With SD normalisation, the sequence logos show reasonably good motifs :
Albeit both datasets return significant motifs, with SD the logos show high error bars and very irregular successions of high- and low-scoring columns in terms of information content.
This effect seems to depend on the TF : it is not observed with RORB or TIGD3
Some data types associate a score to each sequence. This is for example the case for PBM, CSH, ...
We could
oligo-analysis
) successively in the top 100, 200, 300, 500, 1000, 2000, ... sequencesPBM example
CHS