Open jvanheld opened 4 months ago
Hi jacques, are you using background frequencies of masked ref genomes?
Hi Bruno,
I am estimating the k-mer prior probabilities based on a Markov model of order k-2 (or lower if the peak set size is too small).
It would be interesting to evaluate the impact of the BG model by comparing the motifs discovered with different alternatives
Masked ref genome might be interesting but will be a mixture of different sequence types, most of which might have different compositions than the peaks. In my experience, using the peaks themselves gives better results for oligo-analysis
but we could test the alternatives;
However, this will only modify the results of oligo-analysis, and the fact that position-analysis returns only weakly significant motifs suggests that there is a more fundamental problem with these peaks, and that PWMs may be suboptimal to classify peaks as regulated or not by the same TF.
For some datasets, oligo-analysis returns highly significant k-mers, but
position-analysis
returns a weak significance, indicating that there is no strong positional bias of this "motif"For this type of peak sets, we have to evaluate whether it is better to send motifs (PSSM) or to use radically different approaches. I think that for this data type, a possibility would be to apply supervised classification based on a table of k-mer counts in each peak. If we have time (which is far from sure) we could test this.